Acta Electronica Sinica ›› 2022 , Vol. 50 ›› Issue (6) : 1319-1330. DOI: 10.12263/DZXB.20210818

Special Issue: 长摘要论文; 电磁频谱智能+ • Intelligent game of electromagnetic spectrum • Previous Articles Next Articles
  • 作者简介:
  • 饶 宁 男,1997年8月出生,江西上饶人.现为空军工程大学信息与导航学院硕士研究生.主要研究方向为通信对抗、强化学习. E-mail: [email protected]
    许 华 男,1976年4月出生,湖北宜昌人.现为空军工程大学信息与导航学院教授、博士生导师.主要研究方向为通信对抗、信号盲处理.E-mail: [email protected]
    蒋 磊 男,1974年6月出生,江苏无锡人.现为空军工程大学信息与导航学院副教授、硕士生导师.主要研究方向为通信对抗、无线通信技术.E-mail: [email protected]
    宋佰霖 男,1997年11月出生,辽宁沈阳人.现为空军工程大学信息与导航学院硕士研究生.主要研究方向为通信对抗、强化学习. E-mail: [email protected]
    史蕴豪 男,1996年7月出生,陕西咸阳人.现为空军工程大学信息与导航学院博士研究生.主要研究方向为信号识别、深度学习. E-mail: [email protected]
  • Abstract:

    In order to solve the problem of jamming power allocation in battlefield cooperative communication countermeasures, this paper designs a distributed cooperative jamming power allocation method based on multi-agent deep reinforcement learning. Specifically, modeling the communication jamming power allocation as a fully cooperative multi-agent task, then the framework of centralized training and distributed decision-making is adopted to alleviate the characteristic of non-stationary environment and high dimensions in multi-agent system, reducing the communication overhead between agents as well, and introducing the maximum policy entropy criterion to control the exploration efficiency of each agent. Regarding maximizing the cumulative jamming reward and maximizing the entropy of the jamming policy as the optimization goal, then accelerates the learning of cooperative strategies. Simulation results indicate the proposed distributed method can effectively solve the high-dimensional cooperative jamming power allocation problem. Compared with the existing centralized allocation method, it has faster learning speed and less volatility, and the jamming efficiency is 16.8% higher than that of the centralized method under the same conditions.

    Extended Abstract
    In order to solve the problem of jamming power allocation in multi device cooperative jamming in battlefield cooperative communication countermeasure scenario, this paper designs a distributed cooperative jamming power allocation method based on multi-agent deep reinforcement learning. Specifically, modeling the communication jamming power allocation as a fully cooperative multi-agent task, combining the advantages of centralized learning method and independent learning method in multi-agent system, then the framework of centralized training and distributed decision-making is adopted to alleviate the characteristic of non-stationary environment, high decision-making dimensions and difficult training convergence in multi-agent system, reducing the communication overhead between agents as well, and introducing the maximum policy entropy criterion to control the exploration efficiency of each agent. Regarding maximizing the cumulative jamming reward and maximizing the entropy of the jamming policy as the optimization goal, then accelerates the learning of cooperative strategies. In the reward function, the realization of the overall jamming suppression task and the optimization of jamming power utilization are comprehensively considered, and the reasonable jamming power allocation scheme can be adaptively adjusted under different jamming suppression coefficients. Simulation results indicate the proposed distributed method can effectively solve the high-dimensional cooperative jamming power allocation problem, compared with the existing centralized allocation method, it has faster learning speed and less volatility, and the jamming efficiency is 16.8% higher than that of the centralized method under the same conditions. The ablation experiment shows that the maximum strategy entropy can further improve the exploration efficiency and find the optimal scheme faster.

    Key words: communication countermeasures, cooperative resource allocation, multi-agent deep reinforcement learning, distributed strategy, maximum policy entropy

    摘要:

    针对战场通信对抗协同干扰中的干扰功率分配难题,本文基于多智能体深度强化学习设计了一种分布式协同干扰功率分配算法.具体地,将通信干扰功率分配问题构建为完全协作的多智能体任务,采用集中式训练、分布式决策的方式缓解多智能体系统环境非平稳、决策维度高的问题,减少智能体之间的通信开销,并加入最大策略熵准则控制各智能体的探索效率,以最大化累积干扰奖励和最大化干扰策略熵为优化目标,加速各智能体间协同策略的学习.仿真结果表明,所提出的分布式算法能有效解决高维协同干扰功率分配难题,相比于已有的集中式分配算法具有学习速度更快、波动性更小等优点,且相同条件下干扰效率可高出集中式算法16.8%.

    通信对抗, 协同功率分配, 多智能体深度强化学习, 分布式策略, 最大策略熵

    RAO Ning, XU Hua, JIANG Lei, SONG Bai-lin, SHI Yun-hao. Allocation Algorithm of Distributed Cooperative Jamming Power Based on Multi-Agent Deep Reinforcement Learning[J]. Acta Electronica Sinica, 2022, 50(6): 1319-1330.

    饶宁, 许华, 蒋磊, 宋佰霖, 史蕴豪. 基于多智能体深度强化学习的分布式协同干扰功率分配算法[J]. 电子学报, 2022, 50(6): 1319-1330.

    参数 取值
    干扰设备数量 N 3
    通信链路数量 M 5
    单设备最多可同时干扰目标数 U 2
    总干扰带宽 B j 2 MHz
    定频链路带宽 B d 50 kHz
    跳频链路频率间隔 f i 25 kHz· n ( n =1,2,3,4)
    干扰设备最大辐射功率 P max 77 dBm(约50 kW)
    通信电台辐射功率 P c 55 dBm(约300 W)
    噪声功率 σ 2 -85 dBm
    通信链路增益 G c 8 dB
    干扰链路增益 G j 3 dB
    基站与电台最近距离 R c 110 km
    干扰设备与电台最远距离 R j 300 km
    压制系数 K i 2
    柔性更新系数 τ 0.01
    训练回合数 E 5 000
    每回合交互次数 T 500
    经验回放池容量CRB 2 17
    批次样本大小 B 256
    折扣因子 γ 0.98
    熵系数初始值α 1
    参数 取值
    干扰设备数量 N 3
    通信链路数量 M 5
    单设备最多可同时干扰目标数 U 2
    总干扰带宽 B j 2 MHz
    定频链路带宽 B d 50 kHz
    跳频链路频率间隔 f i 25 kHz· n ( n =1,2,3,4)
    干扰设备最大辐射功率 P max 77 dBm(约50 kW)
    通信电台辐射功率 P c 55 dBm(约300 W)
    噪声功率 σ 2 -85 dBm
    通信链路增益 G c 8 dB
    干扰链路增益 G j 3 dB
    基站与电台最近距离 R c 110 km
    干扰设备与电台最远距离 R j 300 km
    压制系数 K i 2
    柔性更新系数 τ 0.01
    训练回合数 E 5 000
    每回合交互次数 T 500
    经验回放池容量CRB 2 17
    批次样本大小 B 256
    折扣因子 γ 0.98
    熵系数初始值α 1
    XIAO Shuo, HUANG Zhen-zhen, ZHANG Guo-peng, YANG Shu-song, JIANG Hai-fei, LI Tian-xu. Deep Reinforcement Learning Algorithm of Multi⁃agent Based on SAC [J]. Acta Electronica Sinica, 2021, 49(9): 1675-1681.