在英伟达H20上实现Qwen3.6-35B-A3B模型推理

javahust

117人浏览 · 2026-04-21 15:48:38

javahust · 2026-04-21 15:48:38 发布

GPU

2张H20 (96G)

启动指令

docker run -it --rm \
--name sglang-qwen36-35b \
--gpus all \
--shm-size 16GB \
-p 30083:8000 \
-v /data/models:/data/models \
docker.m.daocloud.io/lmsysorg/sglang:v0.5.10 \
python3 -m sglang.launch_server \
--model-path /data/models/Qwen3.6-35B-A3B \
--served-model-name Qwen3.6-35B-A3B \
--host 0.0.0.0 \
--port 8000 \
--tp-size 2 \
--context-length 1000 \
--mem-fraction-static 0.8

启动日志

(base) root@node-gpu01:~# docker run -it --rm \
>   --name sglang-qwen36-35b \
>   --gpus all \
>   --shm-size 16GB \
>   -p 30083:8000 \
>   -v /data/models:/data/models \
>   docker.m.daocloud.io/lmsysorg/sglang:v0.5.10 \
>   python3 -m sglang.launch_server \
>   --model-path /data/models/Qwen3.6-35B-A3B \
>   --served-model-name Qwen3.6-35B-A3B \
>   --host 0.0.0.0 \
>   --port 8000 \
>   --tp-size 2 \
>   --context-length 1000 \
>   --mem-fraction-static 0.8

==========
== CUDA ==
==========

CUDA Version 12.9.1

Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.

/sgl-workspace/sglang/python/sglang/launch_server.py:51: UserWarning: 'python -m sglang.launch_server' is still supported, but 'sglang serve' is the recommended entrypoint.
  Example: sglang serve --model-path <model> [options]
  warnings.warn(
Disabling overlap schedule since mamba no_buffer is not compatible with overlap schedule, try to use --disable-radix-cache if overlap schedule is necessary
/sgl-workspace/sglang/python/sglang/srt/entrypoints/http_server.py:172: FastAPIDeprecationWarning: ORJSONResponse is deprecated, FastAPI now serializes data directly to JSON bytes via Pydantic when a return type or response model is set, which is faster and doesn't need a custom response class. Read more in the FastAPI docs: https://fastapi.tiangolo.com/advanced/custom-response/#orjson-or-response-model and https://fastapi.tiangolo.com/tutorial/response-model/
  from sglang.srt.utils.json_response import (
[2026-04-21 07:40:25] server_args=ServerArgs(model_path='/data/models/Qwen3.6-35B-A3B', tokenizer_path='/data/models/Qwen3.6-35B-A3B', tokenizer_mode='auto', tokenizer_worker_num=1, skip_tokenizer_init=False, load_format='auto', model_loader_extra_config='{}', trust_remote_code=False, context_length=1000, is_embedding=False, enable_multimodal=None, revision=None, model_impl='auto', host='0.0.0.0', port=8000, fastapi_root_path='', grpc_mode=False, skip_server_warmup=False, warmups=None, nccl_port=None, checkpoint_engine_wait_weights_before_ready=False, ssl_keyfile=None, ssl_certfile=None, ssl_ca_certs=None, ssl_keyfile_password=None, enable_ssl_refresh=False, dtype='auto', quantization=None, quantization_param_path=None, kv_cache_dtype='auto', enable_fp32_lm_head=False, modelopt_quant=None, modelopt_checkpoint_restore_path=None, modelopt_checkpoint_save_path=None, modelopt_export_path=None, quantize_and_serve=False, rl_quant_profile=None, mem_fraction_static=0.8, max_running_requests=None, max_queued_requests=None, max_total_tokens=None, chunked_prefill_size=8192, enable_dynamic_chunking=False, max_prefill_tokens=16384, prefill_max_requests=None, schedule_policy='fcfs', enable_priority_scheduling=False, disable_priority_preemption=False, default_priority_value=None, abort_on_priority_when_disabled=False, schedule_low_priority_values_first=False, priority_scheduling_preemption_threshold=10, schedule_conservativeness=1.0, page_size=1, swa_full_tokens_ratio=0.8, disable_hybrid_swa_memory=False, radix_eviction_policy='lru', enable_prefill_delayer=False, prefill_delayer_max_delay_passes=30, prefill_delayer_token_usage_low_watermark=None, prefill_delayer_forward_passes_buckets=None, prefill_delayer_wait_seconds_buckets=None, device='cuda', tp_size=2, pp_size=1, pp_max_micro_batch_size=None, pp_async_batch_depth=0, stream_interval=1, stream_response_default_include_usage=False, incremental_streaming_output=False, enable_streaming_session=False, random_seed=452077902, constrained_json_whitespace_pattern=None, constrained_json_disable_any_whitespace=False, watchdog_timeout=300, soft_watchdog_timeout=None, dist_timeout=None, download_dir=None, model_checksum=None, base_gpu_id=0, gpu_id_step=1, sleep_on_idle=False, use_ray=False, custom_sigquit_handler=None, log_level='info', log_level_http=None, log_requests=False, log_requests_level=2, log_requests_format='text', log_requests_target=None, uvicorn_access_log_exclude_prefixes=[], crash_dump_folder=None, show_time_cost=False, enable_metrics=False, enable_mfu_metrics=False, enable_metrics_for_all_schedulers=False, tokenizer_metrics_custom_labels_header='x-custom-labels', tokenizer_metrics_allowed_custom_labels=None, extra_metric_labels=None, bucket_time_to_first_token=None, bucket_inter_token_latency=None, bucket_e2e_request_latency=None, collect_tokens_histogram=False, prompt_tokens_buckets=None, generation_tokens_buckets=None, gc_warning_threshold_secs=0.0, decode_log_interval=40, enable_request_time_stats_logging=False, kv_events_config=None, enable_trace=False, otlp_traces_endpoint='localhost:4317', export_metrics_to_file=False, export_metrics_to_file_dir=None, api_key=None, admin_api_key=None, served_model_name='Qwen3.6-35B-A3B', weight_version='default', chat_template=None, hf_chat_template_name=None, completion_template=None, file_storage_path='sglang_storage', enable_cache_report=False, reasoning_parser=None, tool_call_parser=None, tool_server=None, sampling_defaults='model', dp_size=1, load_balance_method='round_robin', attn_cp_size=1, moe_dp_size=1, dist_init_addr=None, nnodes=1, node_rank=0, json_model_override_args='{}', preferred_sampling_params=None, enable_lora=None, enable_lora_overlap_loading=None, max_lora_rank=None, lora_target_modules=None, lora_paths=None, max_loaded_loras=None, max_loras_per_batch=8, lora_eviction_policy='lru', lora_backend='csgmv', max_lora_chunk_size=16, experts_shared_outer_loras=None, attention_backend='fa3', decode_attention_backend=None, prefill_attention_backend=None, sampling_backend='flashinfer', grammar_backend='xgrammar', mm_attention_backend=None, fp8_gemm_runner_backend='auto', fp4_gemm_runner_backend='auto', nsa_prefill_backend=None, nsa_decode_backend=None, disable_flashinfer_autotune=False, mamba_backend='triton', speculative_algorithm=None, speculative_draft_model_path=None, speculative_draft_model_revision=None, speculative_draft_load_format=None, speculative_num_steps=None, speculative_eagle_topk=None, speculative_num_draft_tokens=None, speculative_accept_threshold_single=1.0, speculative_accept_threshold_acc=1.0, speculative_token_map=None, speculative_attention_mode='prefill', speculative_draft_attention_backend=None, speculative_moe_runner_backend='auto', speculative_moe_a2a_backend=None, speculative_draft_model_quantization=None, speculative_ngram_min_bfs_breadth=1, speculative_ngram_max_bfs_breadth=10, speculative_ngram_match_type='BFS', speculative_ngram_max_trie_depth=18, speculative_ngram_capacity=10000000, enable_multi_layer_eagle=False, ep_size=1, moe_a2a_backend='none', moe_runner_backend='auto', flashinfer_mxfp4_moe_precision='default', enable_flashinfer_allreduce_fusion=False, enforce_disable_flashinfer_allreduce_fusion=False, enable_aiter_allreduce_fusion=False, deepep_mode='auto', ep_num_redundant_experts=0, ep_dispatch_algorithm=None, init_expert_location='trivial', enable_eplb=False, eplb_algorithm='auto', eplb_rebalance_num_iterations=1000, eplb_rebalance_layers_per_chunk=None, eplb_min_rebalancing_utilization_threshold=1.0, expert_distribution_recorder_mode=None, expert_distribution_recorder_buffer_size=1000, enable_expert_distribution_metrics=False, deepep_config=None, moe_dense_tp_size=None, elastic_ep_backend=None, enable_elastic_expert_backup=False, mooncake_ib_device=None, max_mamba_cache_size=None, mamba_ssm_dtype=None, mamba_full_memory_ratio=0.9, mamba_scheduler_strategy='no_buffer', mamba_track_interval=256, linear_attn_backend='triton', linear_attn_decode_backend=None, linear_attn_prefill_backend=None, enable_hierarchical_cache=False, hicache_ratio=2.0, hicache_size=0, hicache_write_policy='write_through', hicache_io_backend='kernel', hicache_mem_layout='layer_first', hicache_storage_backend=None, hicache_storage_prefetch_policy='best_effort', hicache_storage_backend_extra_config=None, enable_hisparse=False, hisparse_config=None, enable_lmcache=False, kt_weight_path=None, kt_method='AMXINT4', kt_cpuinfer=None, kt_threadpool_count=2, kt_num_gpu_experts=None, kt_max_deferred_experts_per_token=None, dllm_algorithm=None, dllm_algorithm_config=None, enable_double_sparsity=False, ds_channel_config_path=None, ds_heavy_channel_num=32, ds_heavy_token_num=256, ds_heavy_channel_type='qk', ds_sparse_decode_threshold=4096, cpu_offload_gb=0, offload_group_size=-1, offload_num_in_group=1, offload_prefetch_step=1, offload_mode='cpu', multi_item_scoring_delimiter=None, disable_radix_cache=False, cuda_graph_max_bs=256, cuda_graph_bs=[1, 2, 4, 8, 12, 16, 24, 32, 40, 48, 56, 64, 72, 80, 88, 96, 104, 112, 120, 128, 136, 144, 152, 160, 168, 176, 184, 192, 200, 208, 216, 224, 232, 240, 248, 256], disable_cuda_graph=False, disable_cuda_graph_padding=False, enable_profile_cuda_graph=False, enable_cudagraph_gc=False, enable_layerwise_nvtx_marker=False, enable_nccl_nvls=False, enable_symm_mem=False, disable_flashinfer_cutlass_moe_fp4_allgather=False, enable_tokenizer_batch_encode=False, disable_tokenizer_batch_decode=False, disable_outlines_disk_cache=False, disable_custom_all_reduce=False, enable_mscclpp=False, enable_torch_symm_mem=False, pre_warm_nccl=False, disable_overlap_schedule=True, enable_mixed_chunk=False, enable_dp_attention=False, enable_dp_lm_head=False, enable_two_batch_overlap=False, enable_single_batch_overlap=False, tbo_token_distribution_threshold=0.48, enable_torch_compile=False, disable_piecewise_cuda_graph=True, enforce_piecewise_cuda_graph=False, enable_torch_compile_debug_mode=False, torch_compile_max_bs=32, piecewise_cuda_graph_max_tokens=8192, piecewise_cuda_graph_tokens=[4, 8, 12, 16, 20, 24, 28, 32, 48, 64, 80, 96, 112, 128, 144, 160, 176, 192, 208, 224, 240, 256, 288, 320, 352, 384, 416, 448, 480, 512, 576, 640, 704, 768, 832, 896, 960, 1024, 1280, 1536, 1792, 2048, 2304, 2560, 2816, 3072, 3328, 3584, 3840, 4096, 4608, 5120, 5632, 6144, 6656, 7168, 7680, 8192], piecewise_cuda_graph_compiler='eager', torchao_config='', enable_nan_detection=False, enable_p2p_check=False, triton_attention_reduce_in_fp32=False, triton_attention_num_kv_splits=8, triton_attention_split_tile_size=None, num_continuous_decode_steps=1, delete_ckpt_after_loading=False, enable_memory_saver=False, enable_weights_cpu_backup=False, enable_draft_weights_cpu_backup=False, allow_auto_truncate=False, enable_custom_logit_processor=False, flashinfer_mla_disable_ragged=False, disable_shared_experts_fusion=False, disable_chunked_prefix_cache=False, disable_fast_image_processor=False, keep_mm_feature_on_device=False, enable_return_hidden_states=False, enable_return_routed_experts=False, scheduler_recv_interval=1, numa_node=None, enable_deterministic_inference=False, rl_on_policy_target=None, enable_attn_tp_input_scattered=False, gc_threshold=None, enable_nsa_prefill_context_parallel=False, nsa_prefill_cp_mode='round-robin-split', enable_fused_qk_norm_rope=False, enable_precise_embedding_interpolation=False, enable_fused_moe_sum_all_reduce=False, enable_prefill_context_parallel=False, prefill_cp_mode='in-seq-split', enable_dynamic_batch_tokenizer=False, dynamic_batch_tokenizer_batch_size=32, dynamic_batch_tokenizer_batch_timeout=0.002, debug_tensor_dump_output_folder=None, debug_tensor_dump_layers=None, debug_tensor_dump_input_file=None, debug_tensor_dump_inject=False, disaggregation_mode='null', disaggregation_transfer_backend='mooncake', disaggregation_bootstrap_port=8998, disaggregation_ib_device=None, disaggregation_decode_enable_offload_kvcache=False, num_reserved_decode_tokens=512, disaggregation_decode_polling_interval=1, encoder_only=False, language_only=False, encoder_transfer_backend='zmq_to_scheduler', encoder_urls=[], enable_adaptive_dispatch_to_encoder=False, custom_weight_loader=[], weight_loader_disable_mmap=False, remote_instance_weight_loader_seed_instance_ip=None, remote_instance_weight_loader_seed_instance_service_port=None, remote_instance_weight_loader_send_weights_group_ports=None, remote_instance_weight_loader_backend='nccl', remote_instance_weight_loader_start_seed_via_transfer_engine=False, engine_info_bootstrap_port=6789, modelexpress_config=None, enable_pdmux=False, pdmux_config_path=None, sm_group_num=8, enable_broadcast_mm_inputs_process=False, enable_prefix_mm_cache=False, mm_enable_dp_encoder=False, mm_process_config={}, limit_mm_data_per_request=None, enable_mm_global_cache=False, decrypted_config_file=None, decrypted_draft_config_file=None, forward_hooks=None)
[2026-04-21 07:40:28] Using default HuggingFace chat template with detected content format: openai
[2026-04-21 07:40:34 TP0] Init torch distributed begin.
[2026-04-21 07:40:34 TP1] Init torch distributed begin.
[Gloo] Rank 1 is connected to 1 peer ranks. Expected number of connected peer ranks is : 1
[Gloo] Rank 0 is connected to 1 peer ranks. Expected number of connected peer ranks is : 1
[Gloo] Rank 1 is connected to 1 peer ranks. Expected number of connected peer ranks is : 1
[Gloo] Rank 0 is connected to 1 peer ranks. Expected number of connected peer ranks is : 1
[2026-04-21 07:40:34 TP0] sglang is using nccl==2.28.3
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[2026-04-21 07:40:35 TP0] Init torch distributed ends. elapsed=0.58 s, mem usage=0.90 GB
[2026-04-21 07:40:35 TP1] Init torch distributed ends. elapsed=0.44 s, mem usage=0.90 GB
[2026-04-21 07:40:35 TP1] Load weight begin. avail mem=91.86 GB
[2026-04-21 07:40:35 TP0] Load weight begin. avail mem=88.01 GB
[2026-04-21 07:40:35 TP1] Multimodal attention backend not set. Use fa3.
[2026-04-21 07:40:35 TP1] Using fa3 as multimodal attention backend.
[2026-04-21 07:40:35 TP0] Multimodal attention backend not set. Use fa3.
[2026-04-21 07:40:35 TP0] Using fa3 as multimodal attention backend.
`torch_dtype` is deprecated! Use `dtype` instead!
`torch_dtype` is deprecated! Use `dtype` instead!
[2026-04-21 07:40:35 TP1] using attn output gate!
[2026-04-21 07:40:35 TP0] using attn output gate!
Multi-thread loading shards: 100% Completed | 26/26 [02:34<00:00,  5.96s/it]
[2026-04-21 07:43:10 TP0] Load weight end. elapsed=155.05 s, type=Qwen3_5MoeForConditionalGeneration, avail mem=55.20 GB, mem usage=32.80 GB.
[2026-04-21 07:43:10 TP1] Load weight end. elapsed=155.06 s, type=Qwen3_5MoeForConditionalGeneration, avail mem=59.05 GB, mem usage=32.80 GB.
[2026-04-21 07:43:10 TP0] Using KV cache dtype: torch.bfloat16
[2026-04-21 07:43:10 TP1] Mamba Cache is allocated. max_mamba_cache_size: 593, conv_state size: 0.41GB, ssm_state size: 17.40GB 
[2026-04-21 07:43:10 TP0] Mamba Cache is allocated. max_mamba_cache_size: 593, conv_state size: 0.41GB, ssm_state size: 17.40GB 
[2026-04-21 07:43:10 TP1] KV Cache is allocated. #tokens: 2078110, K size: 9.91 GB, V size: 9.91 GB
[2026-04-21 07:43:10 TP0] KV Cache is allocated. #tokens: 2078110, K size: 9.91 GB, V size: 9.91 GB
[2026-04-21 07:43:10 TP1] Memory pool end. avail mem=21.38 GB
[2026-04-21 07:43:10 TP0] Memory pool end. avail mem=17.53 GB
[2026-04-21 07:43:10 TP1] Using hybrid linear attention backend for hybrid GDN models.
[2026-04-21 07:43:10 TP1] Capture cuda graph begin. This can take up to several minutes. avail mem=21.29 GB
[2026-04-21 07:43:10 TP0] Linear attention kernel backend: decode=triton, prefill=triton
[2026-04-21 07:43:10 TP0] Using hybrid linear attention backend for hybrid GDN models.
[2026-04-21 07:43:10 TP0] GDN kernel dispatcher: decode=TritonGDNKernel, extend=TritonGDNKernel, verify=TritonGDNKernel packed_decode=True
[2026-04-21 07:43:10 TP0] Capture cuda graph begin. This can take up to several minutes. avail mem=17.44 GB
[2026-04-21 07:43:10 TP0] Capture cuda graph bs [1, 2, 4, 8, 12, 16, 24, 32, 40, 48, 56, 64, 72, 80, 88, 96, 104, 112, 120, 128, 136, 144, 152, 160, 168, 176, 184, 192, 197]
Capturing batches (bs=197 avail_mem=17.25 GB):   0%|                                                                                  | 0/29 [00:00<?, ?it/s]2026-04-21 07:43:11,941 - CUTE_DSL - WARNING - [handle_import_error] - Unexpected error during package walk: cutlass.cute.experimental
[2026-04-21 07:43:11 TP1] Unexpected error during package walk: cutlass.cute.experimental
2026-04-21 07:43:11,961 - CUTE_DSL - WARNING - [handle_import_error] - Unexpected error during package walk: cutlass.cute.experimental
[2026-04-21 07:43:11 TP0] Unexpected error during package walk: cutlass.cute.experimental
[2026-04-21 07:43:13 TP0] Using default MoE kernel config. Performance might be sub-optimal! Config file not found at /sgl-workspace/sglang/python/sglang/srt/layers/moe/fused_moe_triton/configs/triton_3_5_1/E=256,N=256,device_name=NVIDIA_H20.json, you can create them with https://github.com/sgl-project/sglang/tree/main/benchmark/kernels/fused_moe_triton
[2026-04-21 07:43:13 TP0] Using MoE kernel config with down_moe=False. Performance might be sub-optimal! Config file not found at /sgl-workspace/sglang/python/sglang/srt/layers/moe/fused_moe_triton/configs/triton_3_5_1/E=256,N=256,device_name=NVIDIA_H20_down.json, you can create them with https://github.com/sgl-project/sglang/tree/main/benchmark/kernels/fused_moe_triton
[2026-04-21 07:43:13 TP1] Using default MoE kernel config. Performance might be sub-optimal! Config file not found at /sgl-workspace/sglang/python/sglang/srt/layers/moe/fused_moe_triton/configs/triton_3_5_1/E=256,N=256,device_name=NVIDIA_H20.json, you can create them with https://github.com/sgl-project/sglang/tree/main/benchmark/kernels/fused_moe_triton
[2026-04-21 07:43:13 TP1] Using MoE kernel config with down_moe=False. Performance might be sub-optimal! Config file not found at /sgl-workspace/sglang/python/sglang/srt/layers/moe/fused_moe_triton/configs/triton_3_5_1/E=256,N=256,device_name=NVIDIA_H20_down.json, you can create them with https://github.com/sgl-project/sglang/tree/main/benchmark/kernels/fused_moe_triton
Capturing batches (bs=1 avail_mem=16.79 GB): 100%|███████████████████████████████████████████████████████████████████████████| 29/29 [00:17<00:00,  1.68it/s]
[2026-04-21 07:43:28 TP0] Registering 2349 cuda graph addresses
[2026-04-21 07:43:28 TP1] Capture cuda graph end. Time elapsed: 17.99 s. mem usage=0.66 GB. avail mem=20.63 GB.
[2026-04-21 07:43:28 TP1] Disable piecewise CUDA graph because --disable-piecewise-cuda-graph is set
[2026-04-21 07:43:28 TP0] Capture cuda graph end. Time elapsed: 17.99 s. mem usage=0.66 GB. avail mem=16.78 GB.
[2026-04-21 07:43:28 TP0] Disable piecewise CUDA graph because --disable-piecewise-cuda-graph is set
[2026-04-21 07:43:30 TP0] max_total_num_tokens=2078110, chunked_prefill_size=8192, max_prefill_tokens=16384, max_running_requests=197, context_len=1000, available_gpu_mem=16.78 GB
[2026-04-21 07:43:31] INFO:     Started server process [1]
[2026-04-21 07:43:31] INFO:     Waiting for application startup.
[2026-04-21 07:43:31] Using default chat sampling params from model generation config: {'temperature': 1.0, 'top_k': 20, 'top_p': 0.95}
[2026-04-21 07:43:31] INFO:     Application startup complete.
[2026-04-21 07:43:31] INFO:     Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
[2026-04-21 07:43:32] INFO:     127.0.0.1:33958 - "GET /model_info HTTP/1.1" 200 OK
[2026-04-21 07:43:44 TP0] Prefill batch, #new-seq: 1, #new-token: 80, #cached-token: 0, full token usage: 0.00, mamba usage: 0.00, #running-req: 0, #queue-req: 0, cuda graph: False, input throughput (token/s): 0.00
[2026-04-21 07:43:49] INFO:     127.0.0.1:33970 - "POST /v1/chat/completions HTTP/1.1" 200 OK
[2026-04-21 07:43:49] The server is fired up and ready to roll!

调用测试

curl -X POST http://81.70.247.xx:30083/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "Qwen3.6-35B-A3B",
"messages": [
{"role": "user", "content": [
{"type": "text", "text": "你好"}
]}
],
"max_tokens": 512,
"stream": false
}'

DeepSeek技术社区

欢迎加入DeepSeek 技术社区。在这里，你可以找到志同道合的朋友，共同探索AI技术的奥秘。

更多推荐

2025国产AI Agent横评：DeepSeek/Kimi/豆包/通义千问/ToDesk AI五大智能体谁更强

与传统的聊天机器人不同，AI Agent不仅能对话，还能连接外部工具和数据库，完成信息检索、代码执行、文档处理、流程自动化等实际操作。是阿里云打造的AI Agent平台，分为通用版和企业版两个线路。则在实际操作场景的语言理解上独树一帜——当你远程操作电脑遇到问题时，它不仅能理解你的问题描述，还能结合当前屏幕画面给出精准的操作指引，这种"所见即所答"的能力是其他产品不具备的。的AI功能目前随ToDe

DeepSeek技术社区

当AI编程工具开始“锁区”：开发者如何构建稳定的网络访问层？

这两年，AI编程工具已经深度嵌入到开发工作流中。代码补全、Agent式编程、自动化测试……AI正在接管越来越多的脏活累活。但与此同时，一个让国内开发者头疼的问题正在浮出水面：AI工具开始大面积“锁区”了。Cursor用户收到“Model not available”提示，Claude Code直连被阻断，Gemini API请求频繁超时。很多人第一反应是“换个节点就好了”，但实际情况远比这个复杂。

DeepSeek技术社区

【Claude Desktop】Host Claude Code binary not available 错误修复方案

Claude Desktop 需要本地 Claude Code 二进制才能驱动对话功能。在国内环境下，CDN 不可达导致自动下载失败，报错 “Host Claude Code binary not available”。本 skill 提供手动修复方案，从 GitHub 镜像下载二进制、绕过代码签名检查、放置到正确位置。