design.md 4.51 KB

Raw Blame History Permalink



Context

The Nano Banana App currently uses PySide6 for its GUI with a traditional file dialog approach for image uploads. Users must click "添加图片" button, navigate file explorer, and select images. This workflow is inefficient for users who frequently work with screenshots, web images, or need to quickly add multiple reference images.


Goals / Non-Goals


Goals


Enable drag-and-drop of image files directly onto the preview area
Support paste functionality for clipboard images (screenshots, copied web images)
Maintain all existing functionality without breaking changes
Provide clear visual feedback during drag operations
Enhance image validation to prevent invalid files


Non-Goals


Replace existing file dialog upload method
Support non-image file drag operations
Implement complex file management features
Change existing image storage structure


Decisions


Decision: Extend Existing UI Components


What: Enhance the existing QScrollArea and image preview container to accept drops

Why: Maintains current layout and behavior while adding new capabilities

How: Subclass existing components or add event handlers to current widgets


Decision: Use Qt's Built-in Drag-and-Drop Framework


What: Implement dragEnterEvent, dropEvent, and related Qt methods

Why: Native Qt support provides cross-platform compatibility and consistent behavior

How: Enable setAcceptDrops(True) on target widgets and handle QMimeData


Decision: Support Multiple Input Methods


What: File dialog, drag-and-drop, and clipboard paste all coexist

Why: Users have different preferences and workflows

How: Route all inputs through a unified image validation and storage system


Risks / Trade-offs


Risk: File Type Security


Risk: Malicious files could be dropped/pasted into the application

Mitigation: Implement MIME type checking, file header validation, and size limits


Trade-off: Complex Event Handling


Trade-off: More complex event handling code vs. improved user experience

Decision: Accept complexity for significant UX improvement


Risk: Cross-platform Clipboard Variations


Risk: Different clipboard behaviors across Windows, macOS, Linux

Mitigation: Use Qt's unified clipboard API with fallbacks


Migration Plan


Phase 1: Implement drag-and-drop file support


Add event handlers to image preview area
Implement file validation and processing
Add visual feedback (border highlighting)


Phase 2: Add clipboard paste support


Implement paste event handling
Add temporary file handling for clipboard images
Integrate with existing preview system


Phase 3: Enhance validation and error handling


Improve file type detection
Add user-friendly error messages
Implement size limits and warnings


Open Questions


Should we support drag-and-drop onto the entire application window or just the image area?
What should be the maximum clipboard image size to prevent memory issues?
Should we provide different visual feedback for different drag content types?
How should we handle duplicate images from different sources?


Implementation Details


Technical Architecture

Input Sources:
├── File Dialog (existing)
├── Drag-and-Drop (new)
│   ├── Files from Explorer/Finder
│   └── Images from Web Browsers
└── Clipboard Paste (new)
    ├── Screenshots (PrtScn, Win+Shift+S)
    └── Copied Images (Ctrl+C)

↓ Unified Processing Pipeline:

Image Validation:
├── MIME Type Check
├── File Header Validation
├── Size Limits
└── Format Support Check

↓ Storage & Display:

Image Management:
├── Add to self.uploaded_images list
├── Generate thumbnail preview
├── Update UI count and layout
└── Maintain existing delete functionality


Event Handling Flow


Drag Enter Event: Validate drag content, accept if valid images

Drag Move Event: Update visual feedback (highlight, cursor)

Drop Event: Process dropped files, add to upload list

Paste Event: Check clipboard for image data, process if present

Validation Event: Unified validation for all input sources


Key Components to Modify


image_generator.py: Main application file

upload_images() method: Extend to handle multiple input sources

update_image_preview() method: Reuse existing preview logic
Image preview area: Add drag-and-drop event handlers
Main window: Add paste event handling