design.md 4.51 KB

Context

The Nano Banana App currently uses PySide6 for its GUI with a traditional file dialog approach for image uploads. Users must click "添加图片" button, navigate file explorer, and select images. This workflow is inefficient for users who frequently work with screenshots, web images, or need to quickly add multiple reference images.

Goals / Non-Goals

Goals

  • Enable drag-and-drop of image files directly onto the preview area
  • Support paste functionality for clipboard images (screenshots, copied web images)
  • Maintain all existing functionality without breaking changes
  • Provide clear visual feedback during drag operations
  • Enhance image validation to prevent invalid files

Non-Goals

  • Replace existing file dialog upload method
  • Support non-image file drag operations
  • Implement complex file management features
  • Change existing image storage structure

Decisions

Decision: Extend Existing UI Components

  • What: Enhance the existing QScrollArea and image preview container to accept drops
  • Why: Maintains current layout and behavior while adding new capabilities
  • How: Subclass existing components or add event handlers to current widgets

Decision: Use Qt's Built-in Drag-and-Drop Framework

  • What: Implement dragEnterEvent, dropEvent, and related Qt methods
  • Why: Native Qt support provides cross-platform compatibility and consistent behavior
  • How: Enable setAcceptDrops(True) on target widgets and handle QMimeData

Decision: Support Multiple Input Methods

  • What: File dialog, drag-and-drop, and clipboard paste all coexist
  • Why: Users have different preferences and workflows
  • How: Route all inputs through a unified image validation and storage system

Risks / Trade-offs

Risk: File Type Security

  • Risk: Malicious files could be dropped/pasted into the application
  • Mitigation: Implement MIME type checking, file header validation, and size limits

Trade-off: Complex Event Handling

  • Trade-off: More complex event handling code vs. improved user experience
  • Decision: Accept complexity for significant UX improvement

Risk: Cross-platform Clipboard Variations

  • Risk: Different clipboard behaviors across Windows, macOS, Linux
  • Mitigation: Use Qt's unified clipboard API with fallbacks

Migration Plan

  1. Phase 1: Implement drag-and-drop file support

    • Add event handlers to image preview area
    • Implement file validation and processing
    • Add visual feedback (border highlighting)
  2. Phase 2: Add clipboard paste support

    • Implement paste event handling
    • Add temporary file handling for clipboard images
    • Integrate with existing preview system
  3. Phase 3: Enhance validation and error handling

    • Improve file type detection
    • Add user-friendly error messages
    • Implement size limits and warnings

Open Questions

  • Should we support drag-and-drop onto the entire application window or just the image area?
  • What should be the maximum clipboard image size to prevent memory issues?
  • Should we provide different visual feedback for different drag content types?
  • How should we handle duplicate images from different sources?

Implementation Details

Technical Architecture

Input Sources:
├── File Dialog (existing)
├── Drag-and-Drop (new)
│   ├── Files from Explorer/Finder
│   └── Images from Web Browsers
└── Clipboard Paste (new)
    ├── Screenshots (PrtScn, Win+Shift+S)
    └── Copied Images (Ctrl+C)

↓ Unified Processing Pipeline:

Image Validation:
├── MIME Type Check
├── File Header Validation
├── Size Limits
└── Format Support Check

↓ Storage & Display:

Image Management:
├── Add to self.uploaded_images list
├── Generate thumbnail preview
├── Update UI count and layout
└── Maintain existing delete functionality

Event Handling Flow

  1. Drag Enter Event: Validate drag content, accept if valid images
  2. Drag Move Event: Update visual feedback (highlight, cursor)
  3. Drop Event: Process dropped files, add to upload list
  4. Paste Event: Check clipboard for image data, process if present
  5. Validation Event: Unified validation for all input sources

Key Components to Modify

  • image_generator.py: Main application file
  • upload_images() method: Extend to handle multiple input sources
  • update_image_preview() method: Reuse existing preview logic
  • Image preview area: Add drag-and-drop event handlers
  • Main window: Add paste event handling