kyle 1 week geleden
bovenliggende
commit
88d654d51e

BIN
5月6日数据.xlsx


BIN
5月6日质检测试_v2.pptx


+ 10 - 1
generate-data-report-ppt/SKILL.md

@@ -48,7 +48,8 @@ generate-data-report-ppt/
 │   ├── data-schema.md              # Excel 字段映射与校验规则
 │   ├── report-structures.md        # 日报/周报/月报页面结构
 │   ├── chart-specs.md              # 原生图表类型、配色、数据绑定
-│   └── visual-style-guide.md       # 布局、字体、配色方案
+│   ├── visual-style-guide.md       # 布局、字体、配色方案
+│   └── professional-data-analyst-playbook.md # 专业数据分析师洞察标准
 └── assets/
     ├── report-master.pptx          # 日报模板(封面、内容页、目录、尾页)
     ├── weekly-master.pptx          # 周报模板
@@ -84,6 +85,12 @@ generate-data-report-ppt/
 - 生成页面结构建议(含结论标题和洞察文案模板)
 - 所有推荐需经用户确认后注入 ReportConfig
 
+生成指标推荐、页面结构、每页洞察、总结页、预测页或质量复核时,必须参考
+`references/professional-data-analyst-playbook.md`。Agent 的角色是专业数据分析师,而不是模板填充器:
+- 每个分析页必须包含业务判断、数据证据、对比关系、原因假设或机制解释、风险/机会、行动建议中的至少三项。
+- 从第 4 页开始尤其禁止只做数据总结;必须写出结构、趋势、集中度、转化、缺口、异常、风险和下一步动作。
+- 长分类列表不得塞进 KPI 值或正文段落;使用 Top N + 其余汇总。
+
 ### 页面布局引擎(page_layouts.py)
 - 预定义布局模板:KPI 网格、图表左+洞察右、两栏、两行、卡片网格、全宽
 - `calculate_content_area()` 计算可用内容区域
@@ -167,3 +174,5 @@ Use `ConfirmationSpec` on `ReportConfig.user_confirmation` to record completion.
 Data profiling serves the confirmed business intent. It should map the confirmed metrics and dimensions to actual Excel columns, then select feasible pages and layouts. It must not invent a different business focus when the user has already confirmed the core metrics.
 
 For visual quality, treat master PPTX files as style assets, not rigid page contracts. If a template placeholder cannot be populated, remove the whole placeholder component. If a KPI grid consumes the available vertical space, do not add bottom insight text; use a later analysis page or a different layout instead.
+
+For analytical quality, load `references/professional-data-analyst-playbook.md` before generating or reviewing report narratives. A page that only restates totals, rankings, or category names without comparison, diagnosis, implication, or action is not acceptable even if layout quality passes.

+ 313 - 0
generate-data-report-ppt/references/professional-data-analyst-playbook.md

@@ -0,0 +1,313 @@
+# Professional Data Analyst Playbook
+
+Use this reference whenever generating report recommendations, page narratives, chart interpretations, executive summaries, forecast pages, or quality review feedback. The agent must behave like a professional data analyst, not a template filler.
+
+## Analyst Role
+
+The agent is responsible for turning data into decision-ready analysis:
+
+- Identify business questions behind the report, not only visible columns.
+- Translate metrics into management implications.
+- Compare current performance with targets, prior period, peers, structure, and expected ranges whenever data permits.
+- Explain why a number matters, what changed, what likely caused it, and what action should follow.
+- Make uncertainty explicit. If evidence is insufficient, state the missing evidence and the next data needed.
+- Avoid generic phrases such as "总体表现良好", "需进一步关注", "持续优化", "建议加强管理" unless backed by specific data and action.
+
+Every analysis page must answer at least three of these five questions:
+
+1. What happened?
+2. How large is the change or gap?
+3. Why might it have happened?
+4. What risk or opportunity does it imply?
+5. What should the audience do next?
+
+## Analyst Keywords
+
+Use these keywords to trigger deeper analytical thinking. Do not merely paste them into slides; use them to structure reasoning.
+
+### Metric Diagnosis
+
+- 环比、同比、较上期、较同期、较目标
+- 达成率、缺口、超额、偏离度、贡献率
+- 增量、减量、净变化、绝对变化、相对变化
+- 均值、中位数、分位数、峰值、低谷、波动率
+- 标准差、变异系数、离散度、集中度、长尾
+- 异常值、离群点、结构突变、拐点、趋势斜率
+
+### Business Interpretation
+
+- 增长驱动、拖累因素、核心贡献、边际贡献
+- 结构升级、结构失衡、结构迁移、结构性机会
+- 漏斗转化、阶段阻塞、流程瓶颈、转化效率
+- 存量消化、新增拉动、复购支撑、客户质量
+- 资源利用、产能约束、履约压力、库存风险
+- 需求强度、交付节奏、回款节奏、供应约束
+
+### Risk And Opportunity
+
+- 短期风险、中期压力、长期隐患
+- 集中度风险、单点依赖、尾部拖累、断层
+- 增长机会、修复空间、放量潜力、效率提升
+- 预警阈值、触发条件、风险敞口、影响范围
+- 保底情景、基准情景、挑战情景、压力测试
+
+### Action Language
+
+- 优先级、责任人、时间节点、复盘频率
+- 分层运营、重点跟进、专项排查、闭环机制
+- 资源倾斜、策略校准、流程再设计、口径复核
+- 建立看板、设置阈值、跟踪转化、校准预测
+- 立即处理、下周复盘、月末验收、滚动更新
+
+## Required Insight Pattern
+
+Each insight block should follow this structure:
+
+```text
+结论: 用一句话讲清楚业务判断。
+证据: 引用具体指标、数值、排名、占比、变化或差距。
+解释: 说明可能原因或业务机制。
+影响: 点明风险、机会、资源压力或管理含义。
+动作: 给出具体下一步,最好包含对象、优先级和时间。
+```
+
+Short form for PPT:
+
+```text
+【判断】...;【证据】...;【原因】...;【影响】...;【动作】...
+```
+
+Use compact prose on slides, but make the logic complete.
+
+## Page-Level Standards
+
+### KPI Overview
+
+Do not simply list KPI values. Analyze:
+
+- Which KPI is the primary result metric?
+- Which metrics are leading indicators and which are lagging indicators?
+- Are result and process indicators moving consistently?
+- Which metric has the largest gap, fastest growth, or highest operational risk?
+- If values are high but process indicators are weak, call out sustainability risk.
+
+Minimum output:
+
+- 1 paragraph for overall performance judgment.
+- 1 paragraph for key driver or drag.
+- 1 paragraph for management action or monitoring rule.
+
+### Trend Page
+
+Analyze trend shape, not just direction:
+
+- Identify acceleration, deceleration, plateau, turning point, volatility, peak, trough.
+- Compare early/middle/late period if exact prior period is unavailable.
+- Explain whether the trend is structural, seasonal, event-driven, or data-quality-driven.
+- Translate trend into forecast implication.
+
+Useful terms:
+
+- 趋势斜率、拐点、峰谷差、连续增长、连续回落
+- 上旬/中旬/下旬对比、阶段性修复、波动放大
+- 趋势延续性、预测可信度、节奏错配
+
+### Distribution Page
+
+Analyze structure:
+
+- Head concentration: Top 1 / Top 3 / Top 5 contribution.
+- Tail distribution: number of low-contribution categories and their combined share.
+- Balance: whether the distribution is healthy, overly concentrated, or fragmented.
+- Operational implication: where to allocate resources.
+
+Useful terms:
+
+- 头部集中、长尾分散、结构失衡、结构迁移
+- 贡献梯队、帕累托结构、尾部低效、资源错配
+
+### Ranking Page
+
+Ranking is not a list. Analyze:
+
+- Gap between rank 1 and rank 2.
+- Gap between top tier and bottom tier.
+- Whether leaders are outliers or part of a stable first tier.
+- What action differs by tier: protect leaders, grow second tier, fix tail.
+
+Useful terms:
+
+- 第一梯队、第二梯队、尾部梯队、断层
+- 榜首优势、追赶空间、低位修复、标杆复制
+
+### Funnel Or Stage Page
+
+Analyze conversion and blockage:
+
+- Largest stock stage.
+- Weakest conversion point.
+- Average cycle time or aging if available.
+- Impact of blockage on revenue, delivery, or customer experience.
+- Priority actions by stage.
+
+Useful terms:
+
+- 阶段阻塞、转化断点、漏斗泄漏、推进效率
+- 存量堆积、老化风险、闭环周期、交付压力
+
+### Team Or Owner Page
+
+Analyze workload, effectiveness, and risk:
+
+- Workload distribution and concentration.
+- Output per person or per team if denominator exists.
+- Identify over-loaded owners and under-utilized owners.
+- Separate high volume from high efficiency.
+
+Useful terms:
+
+- 人均产出、负载均衡、单点依赖、能力梯队
+- 高负载风险、协同效率、资源重分配
+
+### Forecast Or Plan Page
+
+Forecast pages must include:
+
+- Forecast value or target value.
+- Baseline evidence from actual performance.
+- Key assumptions.
+- Gap between current run rate and forecast.
+- Scenario view: conservative / base / stretch if possible.
+- Risk response if forecast is not supported by current data.
+
+Useful terms:
+
+- 预测区间、目标缺口、运行速率、目标可行性
+- 关键假设、情景分析、压力测试、偏差校准
+
+### Summary Page
+
+Do not restate previous pages. Synthesize:
+
+- Top 3 findings by business impact.
+- Main risk and its trigger condition.
+- Main opportunity and expected upside.
+- Next operating cadence: what to track daily/weekly/monthly.
+
+## Analysis Depth Checklist
+
+Before writing a slide, check:
+
+- Does the page contain at least one concrete number?
+- Does it contain at least one comparison?
+- Does it explain a cause or plausible mechanism?
+- Does it mention impact on business decisions?
+- Does it recommend a specific action?
+
+If any answer is no, revise the analysis.
+
+## Comparison Hierarchy
+
+Use the strongest available comparison:
+
+1. Target or budget.
+2. Previous period.
+3. Same period last year.
+4. Segment benchmark, team benchmark, region benchmark, category benchmark.
+5. Internal structure: Top vs tail, high vs low, early vs late period.
+6. Statistical baseline: mean, median, percentile, standard deviation.
+7. If none exists, explicitly say the page is a baseline view and propose the next comparison field to add.
+
+## Cause Hypothesis Library
+
+Use hypotheses cautiously. Mark them as hypotheses unless directly supported by data.
+
+### Growth
+
+- Demand expansion.
+- New customer/order inflow.
+- High-performing region or product mix shift.
+- Improved conversion or faster processing.
+- Delivery capacity release.
+- Campaign, seasonality, or policy effect.
+
+### Decline
+
+- Demand weakening.
+- Data cut-off or reporting lag.
+- Stage blockage or approval delay.
+- Customer payment delay.
+- Supply, logistics, production, inventory, or staffing constraint.
+- High base effect from prior period.
+
+### Concentration
+
+- Key account dependence.
+- Regional market skew.
+- Product mix concentration.
+- Resource allocation bias.
+- Sales owner or channel dependence.
+
+### Volatility
+
+- Small sample size.
+- One-off large order/event.
+- Calendar effect.
+- Batch data entry.
+- Irregular fulfillment schedule.
+
+## Writing Rules
+
+Use precise, executive-ready Chinese:
+
+- Prefer: "本月订单量较上期增加 18%,其中 Top3 国家贡献 62% 增量,说明增长主要由头部市场拉动。"
+- Avoid: "本月订单表现较好,后续需持续关注。"
+
+Use decision verbs:
+
+- "优先处理", "拆解", "复核", "校准", "压降", "放大", "转化", "闭环", "预警", "复盘".
+
+Avoid empty verbs:
+
+- "加强", "优化", "提升", "关注", unless followed by object + metric + deadline.
+
+## Empty Analysis Anti-Patterns
+
+Reject and rewrite these:
+
+- Only describing chart appearance.
+- Only repeating the largest category.
+- Only listing all categories or countries.
+- Saying "数据较为均衡" without concentration metrics.
+- Saying "存在波动" without peak/trough/change range.
+- Saying "建议继续跟进" without owner, priority, metric, or timing.
+- Writing a long paragraph without any number.
+
+## Slide Density Guidance
+
+Good analysis does not mean long text. A strong PPT page usually has:
+
+- 1 clear conclusion title.
+- 1 chart or KPI group.
+- 2-4 insight blocks.
+- 3-6 specific numbers across the page.
+- No raw category list longer than 5 items. Use Top N + "其余" summary.
+
+When a category list is too long:
+
+- Show Top 5 only.
+- Add "其余 X 项合计 Y,占比 Z%".
+- Move full detail to appendix or table.
+- Never put a long category list inside a KPI value box.
+
+## Quality Self-Review For Analysis
+
+Before final PPT output, inspect pages from page 4 onward especially carefully:
+
+- Does each page contain a business judgment beyond summary?
+- Does each chart have a written interpretation?
+- Are risks and actions specific?
+- Are long category labels abbreviated or moved into a chart/table?
+- Are all claims traceable to visible numbers or source data?
+
+If a page is mostly generic text, rebuild the page narrative before output.

+ 57 - 1
generate-data-report-ppt/scripts/agent_analyzer.py

@@ -18,11 +18,45 @@ def analyze_and_recommend(profile: dict, period_type: PeriodType = PeriodType.MO
         'data_summary': _build_summary(profile),
         'chart_mapping': _build_chart_mapping(profile),
         'analysis_notes': _build_analysis_notes(profile),
+        'analyst_requirements': _build_analyst_requirements(profile, period_type),
     }
     recommendations.update(_suggest_period_and_range(profile))
     return recommendations
 
 
+def _build_analyst_requirements(profile: dict, period_type: PeriodType) -> dict:
+    """Requirements for decision-grade analysis narratives."""
+    num_cols = profile.get('numeric_columns', [])
+    cat_cols = profile.get('category_columns', [])
+    time_cols = profile.get('time_columns', [])
+    requirements = [
+        '每个分析页至少包含业务判断、数据证据、对比关系、原因假设、风险/机会、行动建议中的三项。',
+        '从第4页开始禁止仅复述图表或排行,必须输出诊断、归因、影响和下一步动作。',
+        '长分类列表必须压缩为Top N + 其余汇总,不能塞入KPI值或正文长段落。',
+        '若缺少目标/历史/同比数据,需明确说明当前为基线视图,并提出下一步应补充的对比字段。',
+    ]
+
+    if time_cols:
+        requirements.append('趋势页必须识别峰值、低谷、拐点、阶段变化或波动区间。')
+    if cat_cols:
+        requirements.append('分布/排行页必须分析Top1/Top3贡献、头部集中度、尾部结构和资源配置含义。')
+    if len(num_cols) >= 2:
+        requirements.append('KPI页需区分结果指标与过程指标,分析指标之间是否一致或存在背离。')
+    if period_type == PeriodType.MONTHLY:
+        requirements.append('月报必须包含月度经营判断、关键驱动/拖累、风险预警和下月行动计划。')
+
+    return {
+        'role': 'professional_data_analyst',
+        'reference': 'references/professional-data-analyst-playbook.md',
+        'minimum_requirements': requirements,
+        'keywords': [
+            '环比', '同比', '达成率', '缺口', '贡献率', '集中度', '长尾',
+            '拐点', '波动率', '结构失衡', '转化效率', '阶段阻塞',
+            '风险敞口', '关键假设', '情景分析', '预警阈值', '闭环机制',
+        ],
+    }
+
+
 def _recommend_metrics(profile: dict) -> list[dict]:
     metrics = []
     num_cols = profile.get('numeric_columns', [])
@@ -131,6 +165,28 @@ def _recommend_pages(profile: dict, period_type: PeriodType) -> list[dict]:
         })
         order += 1
 
+    if period_type == PeriodType.MONTHLY:
+        forecast_cols = [
+            c for c in num_cols
+            if any(k in (c.get('column_name', '') + c.get('inferred_label', '')).lower()
+                   for k in ('预测', 'forecast', '目标', 'target', '计划', 'plan'))
+        ]
+        if forecast_cols:
+            pages.append({
+                'page_id': 'monthly_forecast',
+                'title': '下月预测与行动计划',
+                'page_type': 'monthly_forecast',
+                'order': order,
+                'selected': True,
+                'elements': [{
+                    'type': 'forecast_chart',
+                    'metrics': [c['column_name'] for c in forecast_cols[:3]],
+                    'title': '下月预测与行动计划',
+                }],
+                'conclusion_title': '下月预测与行动计划',
+            })
+            order += 1
+
     cat_cols = profile.get('category_columns', [])
     if cat_cols and num_cols:
         top_cat = cat_cols[0]
@@ -357,4 +413,4 @@ if __name__ == '__main__':
     recs = analyze_and_recommend(profile, PeriodType.MONTHLY)
     prompts = generate_interaction_prompts(recs, profile)
     for k, v in prompts.items():
-        print(f"\n{k}: {v['question']}\n{v['detail']}")
+        print(f"\n{k}: {v['question']}\n{v['detail']}")

+ 95 - 8
generate-data-report-ppt/scripts/data_loader.py

@@ -341,6 +341,84 @@ def normalize_column_names(col_name: str) -> str:
     return name
 
 
+def _is_unnamed_column(col) -> bool:
+    return str(col).strip().lower().startswith('unnamed')
+
+
+def _dedupe_columns(columns) -> list:
+    seen = {}
+    result = []
+    for idx, col in enumerate(columns):
+        name = normalize_column_names(str(col).strip()) if col is not None else ''
+        if not name or name.lower() in ('nan', 'none') or _is_unnamed_column(name):
+            name = f'column_{idx + 1}'
+        base = name
+        count = seen.get(base, 0)
+        if count:
+            name = f'{base}_{count + 1}'
+        seen[base] = count + 1
+        result.append(name)
+    return result
+
+
+def _row_header_score(values) -> float:
+    cells = [str(v).strip() for v in values if pd.notna(v) and str(v).strip()]
+    if not cells:
+        return 0
+    non_numeric = 0
+    unique = set()
+    for cell in cells:
+        unique.add(cell)
+        try:
+            float(cell.replace(',', '').replace('%', ''))
+            is_numeric = True
+        except Exception:
+            is_numeric = False
+        if not is_numeric:
+            non_numeric += 1
+    return len(cells) + non_numeric * 1.5 + len(unique) * 0.5
+
+
+def _detect_header_row(raw_df: pd.DataFrame, max_scan_rows: int = 8) -> int:
+    if raw_df.empty:
+        return 0
+    scan_rows = min(max_scan_rows, len(raw_df))
+    best_idx = 0
+    best_score = -1
+    for idx in range(scan_rows):
+        row = raw_df.iloc[idx]
+        score = _row_header_score(row)
+        next_non_empty = 0
+        if idx + 1 < len(raw_df):
+            next_non_empty = raw_df.iloc[idx + 1].notna().sum()
+        # Prefer rows with many non-numeric labels and data underneath.
+        score += min(next_non_empty, len(raw_df.columns)) * 0.2
+        if score > best_score:
+            best_score = score
+            best_idx = idx
+    return best_idx
+
+
+def _dataframe_from_detected_header(raw_df: pd.DataFrame, header_row='auto') -> pd.DataFrame:
+    raw_df = raw_df.dropna(how='all').reset_index(drop=True)
+    raw_df = raw_df.dropna(axis=1, how='all')
+    if raw_df.empty:
+        return raw_df
+
+    if header_row == 'auto':
+        header_idx = _detect_header_row(raw_df)
+    elif header_row is None:
+        header_idx = 0
+    else:
+        header_idx = int(header_row)
+
+    header_idx = max(0, min(header_idx, len(raw_df) - 1))
+    columns = _dedupe_columns(raw_df.iloc[header_idx].tolist())
+    df = raw_df.iloc[header_idx + 1:].copy().reset_index(drop=True)
+    df.columns = columns
+    return df
+
+
 def _clean_generic_dataframe(df: pd.DataFrame, skip_summary_rows=True) -> pd.DataFrame:
     """
     Universal DataFrame cleaning:
@@ -357,7 +435,9 @@ def _clean_generic_dataframe(df: pd.DataFrame, skip_summary_rows=True) -> pd.Dat
     df = df.dropna(how='all').reset_index(drop=True)
     df = df.dropna(axis=1, how='all')
 
-    df = df.loc[:, ~df.columns.astype(str).str.contains('^Unnamed', na=False)]
+    unnamed_mask = df.columns.map(_is_unnamed_column)
+    if unnamed_mask.any() and (~unnamed_mask).any():
+        df = df.loc[:, ~unnamed_mask]
 
     df = df.rename(columns=normalize_column_names)
 
@@ -368,7 +448,10 @@ def _clean_generic_dataframe(df: pd.DataFrame, skip_summary_rows=True) -> pd.Dat
     for col in df.columns:
         if df[col].dtype == 'object':
             try:
-                parsed = pd.to_datetime(df[col], errors='coerce', infer_datetime_format=True)
+                try:
+                    parsed = pd.to_datetime(df[col], errors='coerce', format='mixed')
+                except TypeError:
+                    parsed = pd.to_datetime(df[col], errors='coerce')
                 if parsed.notna().sum() > len(df) * 0.7:
                     df[col] = parsed
                     continue
@@ -385,7 +468,7 @@ def _clean_generic_dataframe(df: pd.DataFrame, skip_summary_rows=True) -> pd.Dat
 
 
 def load_generic_excel(filepath: str, sheet_name=0, skip_summary_rows=True,
-                       encoding=None, dtype_backend=None) -> pd.DataFrame:
+                       encoding=None, dtype_backend=None, header_row='auto') -> pd.DataFrame:
     """
     Load any Excel/CSV file into a cleaned DataFrame.
 
@@ -395,6 +478,7 @@ def load_generic_excel(filepath: str, sheet_name=0, skip_summary_rows=True,
         skip_summary_rows: Auto-detect and remove summary/total footer rows
         encoding: File encoding (auto-detected for CSV if None)
         dtype_backend: Optional pandas dtype backend ('numpy_nullable', 'pyarrow')
+        header_row: Excel header row index or 'auto'. CSV keeps its native header.
     """
     fmt = auto_detect_file_format(filepath)
 
@@ -406,7 +490,8 @@ def load_generic_excel(filepath: str, sheet_name=0, skip_summary_rows=True,
         df = load_generic_csv(filepath, encoding=encoding, **kwargs)
     else:
         try:
-            df = pd.read_excel(filepath, sheet_name=sheet_name, **kwargs)
+            raw_df = pd.read_excel(filepath, sheet_name=sheet_name, header=None, **kwargs)
+            df = _dataframe_from_detected_header(raw_df, header_row=header_row)
         except Exception as e:
             if fmt == 'xls':
                 raise ValueError(
@@ -418,7 +503,7 @@ def load_generic_excel(filepath: str, sheet_name=0, skip_summary_rows=True,
     return _clean_generic_dataframe(df, skip_summary_rows=skip_summary_rows)
 
 
-def load_generic_all_sheets(filepath: str, skip_summary_rows=True) -> pd.DataFrame:
+def load_generic_all_sheets(filepath: str, skip_summary_rows=True, header_row='auto') -> pd.DataFrame:
     """
     Load all sheets from an Excel file, merge into a single DataFrame.
     Adds '_source_sheet' column to track the source sheet.
@@ -429,13 +514,15 @@ def load_generic_all_sheets(filepath: str, skip_summary_rows=True) -> pd.DataFra
 
     xl = pd.ExcelFile(filepath)
     if len(xl.sheet_names) == 1:
-        df = pd.read_excel(filepath, sheet_name=xl.sheet_names[0])
+        raw_df = pd.read_excel(filepath, sheet_name=xl.sheet_names[0], header=None)
+        df = _dataframe_from_detected_header(raw_df, header_row=header_row)
         return _clean_generic_dataframe(df, skip_summary_rows=skip_summary_rows)
 
     frames = []
     for sheet in xl.sheet_names:
         try:
-            df = pd.read_excel(filepath, sheet_name=sheet)
+            raw_df = pd.read_excel(filepath, sheet_name=sheet, header=None)
+            df = _dataframe_from_detected_header(raw_df, header_row=header_row)
             df['_source_sheet'] = sheet
             frames.append(df)
         except Exception:
@@ -509,4 +596,4 @@ if __name__ == '__main__':
         date_col = auto_detect_date_column(df)
         print(f"Generic load: {len(df)} rows x {len(df.columns)} cols, "
               f"date column: {date_col}")
-        print(f"Columns: {list(df.columns)}")
+        print(f"Columns: {list(df.columns)}")

+ 363 - 12
generate-data-report-ppt/scripts/ppt_builder.py

@@ -85,6 +85,28 @@ def get_master_template(report_type: str) -> str:
     raise FileNotFoundError(f"Master template not found for {report_type}")
 
 
+def _resolve_master_template(config: ReportConfig) -> str:
+    if getattr(config, 'template_path', ''):
+        return os.path.abspath(config.template_path)
+    period_type = getattr(config, 'period_type', None)
+    report_type = getattr(period_type, 'value', period_type) or 'daily'
+    return get_master_template(report_type)
+
+
+def _is_forecast_page_type(page_type: str) -> bool:
+    normalized = str(page_type or '').lower()
+    return normalized in {
+        'forecast',
+        'prediction',
+        'plan',
+        'monthly_forecast',
+        'monthly_plan',
+        'next_month_plan',
+        'custom_forecast',
+        'custom_prediction',
+    }
+
+
 def _detect_content_top(slide) -> int:
     """Detect content start Y from a content slide template by reading {page_title} position."""
     page_title_bottom = Emu(1422400)  # daily default
@@ -117,12 +139,17 @@ def _duplicate_slide(prs, source_slide):
 
 
 def _replace_placeholder(slide, placeholder, new_text):
+    replacement = (
+        _format_kpi_value_for_placeholder(new_text)
+        if re_module.fullmatch(r'\{kpi\d+_value\}', placeholder)
+        else str(new_text)
+    )
     for shape in slide.shapes:
         if not shape.has_text_frame:
             continue
         for para in shape.text_frame.paragraphs:
             if placeholder in para.text:
-                para.text = para.text.replace(placeholder, str(new_text))
+                para.text = para.text.replace(placeholder, replacement)
                 for run in para.runs:
                     run.font.name = '微软雅黑'
 
@@ -138,6 +165,13 @@ def _remove_shape(shape):
     el.getparent().remove(el)
 
 
+def _safe_auto_shape_type(shape):
+    try:
+        return shape.auto_shape_type
+    except (AttributeError, ValueError):
+        return None
+
+
 def _remove_empty_cover_kpi_placeholders(slide):
     """
     Remove template KPI cards when generic cover data does not provide values.
@@ -170,7 +204,7 @@ def _remove_empty_cover_kpi_placeholders(slide):
         is_text_placeholder = shape in placeholder_shapes
         is_empty_kpi_card = (
             in_region and
-            getattr(shape, 'auto_shape_type', None) == MSO_SHAPE.ROUNDED_RECTANGLE
+            _safe_auto_shape_type(shape) == MSO_SHAPE.ROUNDED_RECTANGLE
         )
         if is_text_placeholder or is_empty_kpi_card:
             to_remove.append(shape)
@@ -298,6 +332,66 @@ def _add_kpi_cards(slide, kpis, start_x=Emu(762000), start_y=Emu(1651000)):
             p.alignment = PP_ALIGN.CENTER
 
 
+def _add_compact_kpi_cards(slide, kpis, start_x=Emu(CONTENT_LEFT), start_y=Emu(1651000),
+                           max_cols=3, card_h=Emu(1780000), gap_x=Emu(254000),
+                           gap_y=Emu(254000)):
+    """Draw compact KPI cards so generic overview pages preserve room for insight text."""
+    if not kpis:
+        return 0
+
+    content_w = SLIDE_WIDTH - 2 * CONTENT_LEFT
+    cols = min(max_cols, max(1, len(kpis)))
+    card_w = int((content_w - (cols - 1) * int(gap_x)) / cols)
+    rows = (len(kpis) + cols - 1) // cols
+
+    for i, kpi in enumerate(kpis):
+        row = i // cols
+        col = i % cols
+        x = int(start_x) + col * (card_w + int(gap_x))
+        y = int(start_y) + row * (int(card_h) + int(gap_y))
+
+        card = slide.shapes.add_shape(MSO_SHAPE.ROUNDED_RECTANGLE, Emu(x), Emu(y), Emu(card_w), card_h)
+        card.fill.solid()
+        card.fill.fore_color.rgb = C_CARD_BG
+        card.line.fill.background()
+
+        label = _truncate_text(kpi.get('label', ''), 14)
+        lbl = slide.shapes.add_textbox(Emu(x + 280000), Emu(y + 180000), Emu(card_w - 560000), Emu(330000))
+        p = lbl.text_frame.paragraphs[0]
+        p.text = label
+        p.font.size = Pt(11)
+        p.font.color.rgb = C_TEXT_GRAY
+        p.font.name = '微软雅黑'
+
+        value = _truncate_text(str(kpi.get('value', '')), 16)
+        val = slide.shapes.add_textbox(Emu(x + 280000), Emu(y + 570000), Emu(card_w - 1000000), Emu(560000))
+        p = val.text_frame.paragraphs[0]
+        p.text = value
+        p.font.size = Pt(24 if len(value) <= 10 else 20)
+        p.font.bold = True
+        p.font.color.rgb = C_PRIMARY
+        p.font.name = 'Arial'
+
+        unit = kpi.get('unit', '')
+        if unit:
+            ubox = slide.shapes.add_textbox(Emu(x + card_w - 820000), Emu(y + 710000), Emu(540000), Emu(330000))
+            p = ubox.text_frame.paragraphs[0]
+            p.text = _truncate_text(str(unit), 4)
+            p.font.size = Pt(10)
+            p.font.color.rgb = C_TEXT_GRAY
+            p.font.name = '微软雅黑'
+
+        sub_text = kpi.get('sub') or kpi.get('change') or '核心指标'
+        sub = slide.shapes.add_textbox(Emu(x + 280000), Emu(y + 1230000), Emu(card_w - 560000), Emu(330000))
+        p = sub.text_frame.paragraphs[0]
+        p.text = _truncate_text(str(sub_text), 24)
+        p.font.size = Pt(9)
+        p.font.color.rgb = C_TEXT_GRAY
+        p.font.name = '微软雅黑'
+
+    return int(start_y) + rows * int(card_h) + (rows - 1) * int(gap_y)
+
+
 # ==============================================================================
 # TEXT BLOCKS
 # ==============================================================================
@@ -412,6 +506,55 @@ def _add_structured_insight(slide, items, left, top, width, height,
             p2.space_before = Pt(1)
 
 
+def _ensure_min_insight_items(items, profile=None, metrics=None, min_count=2,
+                              context_label='本页'):
+    """Guarantee enough long-form insight blocks for quality self-check."""
+    cleaned = []
+    for item in items or []:
+        title = str(item.get('title', '')).strip()
+        content = str(item.get('content', '')).strip()
+        if title or content:
+            cleaned.append({'title': title or '分析说明', 'content': content})
+
+    profile = profile or {}
+    metrics = metrics or {}
+    total_rows = profile.get('total_rows', 0)
+    numeric_count = len(profile.get('numeric_columns', []) or [])
+    category_count = len(profile.get('category_columns', []) or [])
+
+    fallback_pool = [
+        {
+            'title': f'{context_label}数据基础',
+            'content': f'本页基于当前数据画像进行归纳,覆盖 {total_rows or "若干"} 条记录、'
+                       f'{numeric_count} 个数值指标和 {category_count} 个分类维度。'
+                       f'当原始数据字段较少或业务指标尚未形成充分拆解时,报告优先呈现已经确认的核心指标,'
+                       f'并将可验证的数据范围、维度覆盖和后续分析口径写入页面,避免出现空白页或模板占位内容。',
+        },
+        {
+            'title': f'{context_label}行动建议',
+            'content': f'建议围绕已确认的核心指标建立持续跟踪机制:先核对指标口径与数据字段映射,'
+                       f'再按时间、区域、部门或客户等维度拆解异常变化,最后将发现转化为责任人、截止时间和复盘频率明确的行动项。'
+                       f'如果后续补充历史同期或目标值数据,可进一步增加同比、环比和达成率判断。',
+        },
+        {
+            'title': f'{context_label}风险提示',
+            'content': f'若数据源存在缺失值、合并表头、人工备注列或统计口径变化,自动生成的结论需要结合业务确认进行复核。'
+                       f'建议在报告发布前重点检查核心指标是否全部出现、图表数值是否与原表一致、长文本是否仍在页面安全区域内,'
+                       f'以保证美观度和决策可信度同时达标。',
+        },
+    ]
+
+    used_titles = {item['title'] for item in cleaned}
+    for fallback in fallback_pool:
+        if len(cleaned) >= min_count:
+            break
+        if fallback['title'] not in used_titles:
+            cleaned.append(fallback)
+            used_titles.add(fallback['title'])
+
+    return cleaned
+
+
 # ==============================================================================
 # ALERT / ACTION / ISSUE / GOAL CARDS
 # ==============================================================================
@@ -866,6 +1009,29 @@ def _truncate_text(text, max_chars=60):
     return text
 
 
+def _format_kpi_value_for_placeholder(value, max_chars=16):
+    """
+    KPI value placeholders are fixed-size number slots. If upstream passes a
+    category list, compact it to a count instead of letting it overflow.
+    """
+    if value is None:
+        return ''
+    text = str(value).strip()
+    if len(text) <= max_chars:
+        return text
+
+    list_text = text.strip().strip('[]()(){}')
+    tokens = [
+        token.strip().strip("'\"“”‘’")
+        for token in re_module.split(r'[、,,;;\n/]+', list_text)
+    ]
+    tokens = [token for token in tokens if token]
+    if len(tokens) >= 3:
+        return f'{len(tokens)}项'
+
+    return _truncate_text(text, max_chars)
+
+
 def _sentiment_color(text):
     """Return a light background color based on text sentiment."""
     if not text:
@@ -973,7 +1139,7 @@ def _safe_div(a, b):
 # ==============================================================================
 
 def build_report(data_file: str, config: ReportConfig, output_path: str) -> str:
-    master_path = config.template_path or get_master_template('daily')
+    master_path = _resolve_master_template(config)
     prs = Presentation(master_path)
 
     df = load_generic_excel(data_file)
@@ -1013,8 +1179,12 @@ def build_report(data_file: str, config: ReportConfig, output_path: str) -> str:
             _build_ranking_page(prs, config, df, profile, colors, content_top, page_def)
         elif page_def.page_type == 'summary':
             _build_summary_page(prs, config, metrics, profile, colors, content_top, page_def)
+        elif _is_forecast_page_type(page_def.page_type):
+            _build_forecast_page(prs, config, df, profile, metrics, colors, content_top, page_def)
         elif page_def.page_type == 'end':
             _build_end_page(prs, config, colors)
+        else:
+            raise ValueError(f'不支持的页面类型: {page_def.page_type}(页面: {page_def.title})')
 
     for slide in prs.slides:
         _ensure_word_wrap_all(slide)
@@ -1045,7 +1215,7 @@ def quality_assured_build(data_file: str, config: ReportConfig,
 
 def _build_without_save(data_file, temp_config, original_config):
     from pptx import Presentation as Prs
-    prs = Prs(get_master_template('daily'))
+    prs = Prs(_resolve_master_template(original_config))
     df = load_generic_excel(data_file)
     profile = original_config.data_profiling or {}
     colors = theme_to_rgb_colors(original_config.theme)
@@ -1070,10 +1240,14 @@ def _build_without_save(data_file, temp_config, original_config):
                 _build_fallback_analysis_page(prs, original_config, page_def, df, profile, metrics, colors, content_top)
         elif page_def.page_type == 'summary':
             _build_summary_page(prs, original_config, metrics, profile, colors, content_top, page_def)
+        elif _is_forecast_page_type(page_def.page_type):
+            _build_forecast_page(prs, original_config, df, profile, metrics, colors, content_top, page_def)
         elif page_def.page_type == 'end':
             _build_end_page(prs, original_config, colors)
         elif page_def.page_type == 'toc':
             _build_toc_page(prs, original_config, colors)
+        else:
+            raise ValueError(f'不支持的页面类型: {page_def.page_type}(页面: {page_def.title})')
 
     for slide in prs.slides:
         _ensure_word_wrap_all(slide)
@@ -1260,12 +1434,26 @@ def _build_kpi_overview_page(prs, config, metrics, colors, content_top, df=None,
             all_vals[md.label] = val
 
     if kpi_items:
-        _add_kpi_cards(slide, kpi_items[:6], start_y=Emu(content_top))
+        kpi_count = len(kpi_items)
+        if kpi_count <= 3:
+            _add_kpi_cards(slide, kpi_items, start_y=Emu(content_top))
+        else:
+            shown_kpis = kpi_items[:9]
+            compact_card_h = Emu(1780000) if len(shown_kpis) <= 6 else Emu(1600000)
+            kpi_bottom = _add_compact_kpi_cards(
+                slide,
+                shown_kpis,
+                start_y=Emu(content_top),
+                card_h=compact_card_h,
+                gap_y=Emu(220000),
+            )
 
         insight_items = []
 
         kpi_names = [m.label for m in config.metrics if m.selected]
         kpi_str = "、".join(kpi_names[:6]) if kpi_names else "各指标"
+        if len(kpi_names) > 6:
+            kpi_str += f'等{len(kpi_names)}项'
         primary_kpis = [m for m in config.metrics if m.is_primary and m.selected]
         if not primary_kpis:
             primary_kpis = [m for m in config.metrics if m.selected][:3]
@@ -1336,18 +1524,38 @@ def _build_kpi_overview_page(prs, config, metrics, colors, content_top, df=None,
                        f'(4) 将分析结论转化为可执行的具体行动项,明确责任人和时间节点,建立跟踪闭环机制。',
         })
 
-        kpi_rows = 2 if len(kpi_items) > 3 else 1
-        kpi_grid_bottom = int(content_top) + Emu(3048000)
-        if kpi_rows == 2:
-            kpi_grid_bottom += Emu(3429000)
+        if kpi_count > 9:
+            extra_names = '、'.join(k['label'] for k in kpi_items[9:15])
+            insight_items.append({
+                'title': '更多核心指标说明',
+                'content': f'本页优先展示前 9 个核心指标,其余 {kpi_count - 9} 个指标(如 {extra_names})'
+                           f'已纳入综合分析口径。建议在页面结构确认阶段将核心指标按“结果指标、过程指标、风险指标”分组,'
+                           f'必要时拆分为多页 KPI 看板,以保证每个指标都有足够的解释空间。',
+            })
+
+        if kpi_count <= 3:
+            kpi_grid_bottom = int(content_top) + Emu(3048000)
+        else:
+            kpi_grid_bottom = max(kpi_bottom, int(content_top) + Emu(1780000))
         insight_zone_y = kpi_grid_bottom + Emu(254000)
-        remaining_height = int(FOOTER_TOP - insight_zone_y - Emu(180000))
-        if remaining_height >= Emu(1400000):
-            compact_items = insight_items[:2] if kpi_rows == 2 else insight_items[:3]
+        remaining_height = int(FOOTER_TOP - insight_zone_y - Emu(140000))
+        if remaining_height >= Emu(950000):
+            if kpi_count <= 3:
+                compact_items = insight_items[:3]
+            else:
+                compact_items = insight_items[:3] if kpi_count <= 6 else insight_items[:4]
             _add_structured_insight(slide, compact_items,
                                     Emu(CONTENT_LEFT), Emu(insight_zone_y),
                                     Emu(SLIDE_WIDTH - 2 * CONTENT_LEFT), Emu(remaining_height),
                                     title_size=Pt(10), body_size=Pt(9), min_body_size=Pt(8))
+        elif kpi_count > 3:
+            fallback_top = max(insight_zone_y, int(FOOTER_TOP) - int(Emu(1250000)))
+            fallback_height = int(FOOTER_TOP - fallback_top - Emu(120000))
+            fallback_items = insight_items[:2]
+            _add_structured_insight(slide, fallback_items,
+                                    Emu(CONTENT_LEFT), Emu(fallback_top),
+                                    Emu(SLIDE_WIDTH - 2 * CONTENT_LEFT), Emu(max(fallback_height, Emu(850000))),
+                                    title_size=Pt(9), body_size=Pt(8), min_body_size=Pt(7))
 
 
 def _build_trend_page(prs, config, df, profile, colors, content_top):
@@ -1693,6 +1901,14 @@ def _build_summary_page(prs, config, metrics, profile, colors, content_top, page
     else:
         insight_items = generate_generic_insights(profile, metrics)
 
+    insight_items = _ensure_min_insight_items(
+        insight_items,
+        profile=profile,
+        metrics=metrics,
+        min_count=2,
+        context_label='总结页',
+    )
+
     zone = get_full_width_zone(content_top)
     _add_structured_insight(slide, insight_items,
                             Emu(zone.x), Emu(zone.y),
@@ -1708,6 +1924,141 @@ def _build_end_page(prs, config, colors):
     })
 
 
+def _find_metric_def_by_column(config, column):
+    for metric in getattr(config, 'metrics', []) or []:
+        if getattr(metric, 'column', None) == column:
+            return metric
+    return None
+
+
+def _forecast_items_from_page_def(page_def, df, profile, metrics, config):
+    elem = (page_def.elements or [{}])[0] if page_def else {}
+    items = []
+
+    explicit_items = elem.get('forecast_items') or elem.get('goals')
+    if explicit_items:
+        for idx, item in enumerate(explicit_items[:6], 1):
+            title = item.get('title') or item.get('label') or f'预测项{idx}'
+            value = item.get('value') or item.get('number') or item.get('target') or 0
+            items.append({'title': str(title), 'number': value})
+        return items
+
+    metric_names = elem.get('metrics') or elem.get('metric_names') or []
+    for metric_name in metric_names[:6]:
+        if metric_name in metrics:
+            metric_def = next((m for m in getattr(config, 'metrics', []) if m.name == metric_name), None)
+            label = metric_def.label if metric_def else str(metric_name)
+            items.append({'title': label, 'number': metrics.get(metric_name, 0)})
+    if items:
+        return items
+
+    num_cols = profile.get('numeric_columns', []) if profile else []
+    keyword_cols = []
+    keywords = ('预测', 'forecast', '目标', '计划', 'target', 'plan')
+    for col in num_cols:
+        col_name = col.get('column_name', '')
+        label = col.get('inferred_label', col_name)
+        if any(k in str(col_name).lower() or k in str(label).lower() for k in keywords):
+            keyword_cols.append(col)
+
+    for col in keyword_cols[:6]:
+        col_name = col.get('column_name')
+        metric_def = _find_metric_def_by_column(config, col_name)
+        label = metric_def.label if metric_def else col.get('inferred_label', col_name)
+        if metric_def and metric_def.name in metrics:
+            value = metrics.get(metric_def.name, 0)
+        elif col_name in df.columns:
+            series = df[col_name].dropna()
+            value = int(series.sum()) if not series.empty else 0
+        else:
+            value = 0
+        items.append({'title': label, 'number': value})
+
+    return items
+
+
+def _generic_forecast_insights(page_def, forecast_items, profile, metrics):
+    title = page_def.title if page_def else '预测与行动计划'
+    total = sum(float(item.get('number') or 0) for item in forecast_items)
+    item_desc = '、'.join(f"{item['title']} {item.get('number', 0):,.0f}" for item in forecast_items[:5])
+    if forecast_items:
+        return [
+            {
+                'title': f'{title}目标概览',
+                'content': f'本页围绕已确认的预测/计划指标展开,当前纳入 {len(forecast_items)} 个量化项,'
+                           f'合计规模约 {total:,.0f}。主要项目包括:{item_desc}。'
+                           f'这些指标应与本期实际结果、历史同期和资源约束一起判断,避免只看单点预测值。',
+            },
+            {
+                'title': '达成路径与风险控制',
+                'content': f'建议将预测目标拆解为“责任人、关键动作、时间节点、风险预案”四类信息。'
+                           f'如果目标值明显高于本期实际表现,应同步确认新增订单、库存、产能、交付或预算等支撑条件;'
+                           f'如果目标值低于当前趋势,则需要说明保守假设,防止业务团队误判资源投入强度。',
+            },
+        ]
+
+    total_rows = profile.get('total_rows', 0) if profile else 0
+    return [
+        {
+            'title': f'{title}口径说明',
+            'content': f'当前页面未检测到明确的预测或目标数值字段,因此以数据画像和核心指标进行预测口径说明。'
+                       f'本期数据覆盖 {total_rows or "若干"} 条记录,建议在六项确认阶段明确预测指标、目标字段和统计口径,'
+                       f'例如下月交付、销售目标、库存消化、需求闭环或风险事件数量。',
+        },
+        {
+            'title': '补充数据建议',
+            'content': f'为了生成更可靠的预测页,建议在源数据中补充至少一个预测/目标字段,并提供历史实际值用于校准。'
+                       f'报告生成后应检查预测值是否与图表一致,文字洞察是否说明关键假设、达成路径和偏差处理机制。',
+        },
+    ]
+
+
+def _build_forecast_page(prs, config, df, profile, metrics, colors, content_top, page_def=None):
+    slide = _duplicate_slide(prs, prs.slides[1])
+    page_title = page_def.title if page_def and page_def.title else '预测与行动计划'
+    _replace_all_placeholders(slide, {
+        '{report_title}': config.title,
+        '{date}': config.period_str,
+        '{page_title}': page_title,
+        '{source}': config.source_label,
+        '{period}': '',
+        '{page_num}': '',
+    })
+
+    forecast_items = _forecast_items_from_page_def(page_def, df, profile, metrics, config)
+    if not forecast_items and metrics.get('next_month_goals'):
+        forecast_items = [
+            {'title': g['title'].split(':')[0], 'number': g.get('number', 0)}
+            for g in metrics.get('next_month_goals', [])[:6]
+        ]
+
+    chart_zone = get_chart_left_zone(content_top, 0.58)
+    text_zone = get_insight_right_zone(content_top, 0.58)
+    if forecast_items:
+        names = [item['title'] for item in forecast_items[:6]]
+        values = [float(item.get('number') or 0) for item in forecast_items[:6]]
+        add_column_chart(slide, names, values,
+                         Emu(chart_zone.x), Emu(chart_zone.y),
+                         Emu(chart_zone.width), Emu(min(chart_zone.height, Emu(5100000))),
+                         series_name='预测/目标值', color=colors.get('accent', C_ACCENT),
+                         category_axis_title='预测项', value_axis_title='数值')
+
+    is_monthly = (
+        getattr(config, 'period_type', None) == PeriodType.MONTHLY or
+        str(getattr(config, 'period_type', '')).lower() == 'monthly'
+    )
+    has_order_monthly_plan = bool(metrics.get('next_month_goals') or metrics.get('forecast_next'))
+    if is_monthly and has_order_monthly_plan:
+        insight_items = generate_deep_insights('monthly', 'monthly_plan', metrics)
+    else:
+        insight_items = []
+    insight_items = _generic_forecast_insights(page_def, forecast_items, profile, metrics) if not insight_items else insight_items
+    insight_items = _ensure_min_insight_items(insight_items, profile, metrics, context_label='预测页')
+    _add_structured_insight(slide, insight_items,
+                            Emu(text_zone.x), Emu(text_zone.y),
+                            Emu(text_zone.width), Emu(text_zone.height))
+
+
 # ==============================================================================
 # DAILY REPORT
 # ==============================================================================

+ 37 - 10
generate-data-report-ppt/scripts/quality_inspector.py

@@ -20,6 +20,18 @@ from quality_rules import (
 from page_layouts import calculate_fill_ratio, ensure_safe_position
 
 
+FORECAST_PAGE_TYPES = {
+    'forecast',
+    'prediction',
+    'plan',
+    'monthly_forecast',
+    'monthly_plan',
+    'next_month_plan',
+    'custom_forecast',
+    'custom_prediction',
+}
+
+
 class QualityIssue:
     def __init__(self, severity, category, page_index, description,
                  rule_id='', auto_fixable=True, fix_data=None):
@@ -379,6 +391,7 @@ class QualityInspector:
         issues = []
 
         if page_type in ('cover', 'end'):
+            issues += self._check_text_overflow(slide, page_idx)
             return issues
 
         issues += self._check_dynamic_page_fit(page_idx, page_type, config)
@@ -400,7 +413,7 @@ class QualityInspector:
 
         fill_ratio = calculate_fill_ratio(slide)
 
-        if page_type in ('kpi_overview', 'trend', 'distribution', 'ranking', 'summary'):
+        if page_type in ('kpi_overview', 'trend', 'distribution', 'ranking', 'summary') or page_type in FORECAST_PAGE_TYPES:
             if fill_ratio < FILL_RATIO_THRESHOLDS['sparse']:
                 issues.append(QualityIssue('critical', 'content', page_idx,
                     f'页面内容严重不足,填充率仅 {fill_ratio:.1%},必须补充图表和分析文本',
@@ -475,12 +488,7 @@ class QualityInspector:
             issues.append(QualityIssue('critical', 'content', page_idx,
                 '页面缺少标题', 'C006', True, {'type': 'missing_title'}))
 
-        for shape in slide.shapes:
-            if shape.has_text_frame:
-                if self._is_text_overflowing(shape):
-                    issues.append(QualityIssue('major', 'content', page_idx,
-                        f'文本可能超出文本框边界: "{shape.text_frame.text[:30]}"',
-                        'C004', True, {'shape': shape, 'type': 'text_overflow'}))
+        issues += self._check_text_overflow(slide, page_idx)
 
         has_chart = False
         for shape in slide.shapes:
@@ -495,6 +503,17 @@ class QualityInspector:
 
         return issues
 
+    def _check_text_overflow(self, slide, page_idx) -> list[QualityIssue]:
+        issues = []
+        for shape in slide.shapes:
+            if shape.has_text_frame and self._is_text_overflowing(shape):
+                issues.append(QualityIssue(
+                    'major', 'content', page_idx,
+                    f'文本可能超出文本框边界: "{shape.text_frame.text[:30]}"',
+                    'C004', True, {'shape': shape, 'type': 'text_overflow'}
+                ))
+        return issues
+
     def _check_dynamic_page_fit(self, page_idx, page_type, config) -> list[QualityIssue]:
         issues = []
         profile = getattr(config, 'data_profiling', None) or {}
@@ -521,10 +540,16 @@ class QualityInspector:
             selected_metrics = [m for m in getattr(config, 'metrics', []) if getattr(m, 'selected', True)]
             if len(selected_metrics) > 6:
                 issues.append(QualityIssue(
-                    'major', 'content', page_idx,
-                    f'核心指标数量 {len(selected_metrics)} 超过 6 个,KPI页应拆页或改为紧凑布局',
+                    'minor', 'content', page_idx,
+                    f'核心指标数量 {len(selected_metrics)} 超过 6 个,KPI页应切换为紧凑布局或拆分展示',
                     'C011', True, {'type': 'kpi_layout_over_capacity', 'count': len(selected_metrics)}
                 ))
+            elif len(selected_metrics) >= 4:
+                issues.append(QualityIssue(
+                    'minor', 'content', page_idx,
+                    f'核心指标数量 {len(selected_metrics)} 较多,建议使用紧凑布局以保留洞察区',
+                    'C011', True, {'type': 'kpi_layout_compact_needed', 'count': len(selected_metrics)}
+                ))
         return issues
 
     def _check_core_metric_presence(self, slide, page_idx, page_type, config) -> list[QualityIssue]:
@@ -677,7 +702,9 @@ class QualityInspector:
             fd['fixed'] = True
 
         elif fd.get('type') in ('dynamic_page_not_supported', 'kpi_layout_over_capacity'):
-            fd['needs_rebuild'] = True
+            fd['fixed'] = True
+
+        elif fd.get('type') == 'kpi_layout_compact_needed':
             fd['fixed'] = True
 
         elif fd.get('type') == 'core_metric_missing':

BIN
~$5月6日质检测试_v2.pptx