Deepseek janus Pro 通过comfyui实现“图片描述和转绘“双工作流

通过这个工作流, 通过一个模型我们可以实现 “图片描述” 和 “图片生成”双工作流通过模型自己描述图片然后再生成新的图片, 可以实现 “图片转绘” 功能, 绘图效果更加可控.省去提示描述的随机性和模型理解的歧义, 生成图片更准确这里为了帮助大家更好地掌握 ComfyUI，分享一套字节大佬整理的ComfyUI工作流集合，其包含了很多好玩有趣，但又有点复杂的工作流节点和json配置。

程序员筱筱

2124人浏览 · 2025-02-06 10:44:21

程序员筱筱 · 2025-02-06 10:44:21 发布

背景:

2025 年 1 月 27 日，中国 AI 公司 DeepSeek 重磅出击，强势开源多模态模型 Janus-Pro-7B，堪称 “王炸” 发布。
在 GenEval 和 DPG-Bench 这两大极具权威性的测试中，**[Janus-Pro-7B] 的表现十分惊艳，一举超越了 OpenAI 的 DALL・E 3、Stable Diffusion 等业内标杆产品。
更让人惊叹不已的是，这款模型仅具备 7B 的参数，却能同时支持图像生成和多模态理解两大功能，并且在普通的高端电脑上就能顺畅运行。

模型特点和下载:

Janus-Pro :是一种新型的自回归框架，它统一了多模态理解和生成。
Janus-Pro首创“双路径”设计.
理解任务：采用SigLIP-L视觉编码器，支持384×384像素输入，精准解析图像语义。
生成任务：使用降采样率16的分词器，生成分辨率更高、细节更细腻的图像。
该模型在haggingface上有以下版本:(也可以去[modescope] 上看看有没有相应模型下载)

1.Janus-1.3B    4096      Hugging Face  
JanusFlow-1.3B    4096      Hugging Face  
Janus-Pro-1B  4096      Hugging Face  
Janus-Pro-7B  4096      Hugging Face

根据我的6G显存, 只能选择Pro-1B 版本, 估计比Pro-7B 效果会差一点, 但看着效果还行.
将模型文件放在 ComfyUI/models/Janus-Pro/Janus-Pro-1B 目录下

comfyui-janus 插件安装:

ComfyUi 是一个强大的AI绘画工具, 可以通过工作流的方式, 实现多种功能. 这里就不详细介绍了, 有兴趣的可以去看看官方文档.
ComfyUI开始有支持 janus 的节点了. 在comfyui manager节点管理器查找comfyui-janus 插件然后安装, 安装完后重启comfyui.
也可以通过插件的url 安装(https://github.com/CY-CHENYUE/ComfyUI-Janus-Pro)

工作流设计描述:

加载图像节点将load 一个图片以供模型描述使用.
janus一个模型同时支持 “图片描述” 和 “图片生成” 功能, 所以一个模型loader 节点就可以完成, 并且模型自带编码和解码器
image understanding 节点, 用于图片描述 , 通过展示文本, 将描述输出
translate 节点 , 将图片描述内容翻译为英文. 通过展示文本, 将翻译后的内容输出
image generation 节点, 用于图片生成, 通过展示图片, 将生成的图片输出
将3 中生成的英文文本输入给 image generation 节点, 用于生成新的图片. 也可以将内容根据需要进行修改(比如将生成描述的西红柿换成苹果), 然后生成新的图片.
保存的工作流api输出文件: comfyui-janus-pro-1b-workflow.json

{
  "15": {
    "inputs": {
      "prompt": [
        "31",
        0
      ],
      "seed": 1050186241727179,
      "batch_size": 1,
      "cfg_weight": 5,
      "temperature": 1,
      "top_p": 0.95,
      "speak_and_recognation": true,
      "model": [
        "16",
        0
      ],
      "processor": [
        "16",
        1
      ]
    },
    "class_type": "JanusImageGeneration",
    "_meta": {
      "title": "Janus Image Generation"
    }
  },
  "16": {
    "inputs": {
      "model_name": "deepseek-ai/Janus-Pro-1B"
    },
    "class_type": "JanusModelLoader",
    "_meta": {
      "title": "Janus Model Loader"
    }
  },
  "18": {
    "inputs": {
      "images": [
        "15",
        0
      ]
    },
    "class_type": "PreviewImage",
    "_meta": {
      "title": "预览图像"
    }
  },
  "19": {
    "inputs": {
      "question": "用中文描述这张图的详细内容",
      "seed": 951987545857126,
      "temperature": 0.1,
      "top_p": 0.95,
      "max_new_tokens": 512,
      "speak_and_recognation": true,
      "model": [
        "16",
        0
      ],
      "processor": [
        "16",
        1
      ],
      "image": [
        "20",
        0
      ]
    },
    "class_type": "JanusImageUnderstanding",
    "_meta": {
      "title": "Janus Image Understanding"
    }
  },
  "20": {
    "inputs": {
      "image": "采摘 (3).jpg",
      "upload": "image"
    },
    "class_type": "LoadImage",
    "_meta": {
      "title": "加载图像"
    }
  },
  "21": {
    "inputs": {
      "text": [
        "19",
        0
      ],
      "text2": "这张图片展示了一个放在深色木质桌面上的小番茄。小番茄呈圆形，颜色鲜红，顶部有绿色的叶子。背景简单，没有其他物品，光线从左上方照射下来，使小番茄的表面反射出一些光亮。整体画面简洁而清晰，突出了小番茄的鲜艳色彩和自然质感。"
    },
    "class_type": "ShowText|pysssss",
    "_meta": {
      "title": "展示文本"
    }
  },
  "31": {
    "inputs": {
      "text": [
        "34",
        0
      ],
      "text2": "This image shows a small tomato placed on a dark wooden table. The small tomato is round and bright red, with green leaves on top. The background is simple, with no other objects. The light shines from the upper left, reflecting some brightness on the surface of the small tomato. The overall scene is simple and clear, highlighting the vivid color and natural texture of the small tomato."
    },
    "class_type": "ShowText|pysssss",
    "_meta": {
      "title": "展示文本"
    }
  },
  "34": {
    "inputs": {
      "from_translate": "zh-CN",
      "to_translate": "en",
      "model": "glm-4-flash",
      "max_tokens": 1024,
      "temperature": 0.95,
      "top_p": 0.7,
      "text": [
        "19",
        0
      ]
    },
    "class_type": "ChatGLM4TranslateTextNode",
    "_meta": {
      "title": "ChatGLM-4 Translate Text Node"
    }
  }
}