🧑‍🎓 About Me

I graduated in 2024 with a Bachelor’s degree in Artificial Intelligence from Chien-Shiung Wu College, Southeast University (SEU, 985 & 211). Afterwards, I was admitted without examination to the Master’s program at Beijing Electronic Science and Technology Institute (BESTI), where I initially considered a civil service career after encountering some early research setbacks. However, I am fortunate to be advised by Dr. Xiaojun Jia, a postdoctoral researcher at Nanyang Technological University, and to collaborate with Weixin Wang, Haoxuan Ma, and many other friends. I am also honored to be interning at Alibaba Security under the mentorship of Dr. Ranjie Duan, where I have the privilege of working alongside colleagues whose guidance and collaboration have broadened my perspective and deepened my commitment to research. These ongoing experiences continually inspire me and reinforce my determination to pursue a Ph.D. in trustworthy AI.

📝 Selected Papers

ICLR 2025 (Under Review)
IRL with DRS

Inverse Reinforcement Learning with Dynamic Reward Scaling for LLM Alignment [Paper] [Code]

Ruoxi Cheng, Haoxuan Ma, Weixin Wang, Ranjie Duan, Jiexi Liu, Xiaoshuang Jia, Simeng Qin, Xiaochun Cao, Yang Liu, Xiaojun Jia

Under review at ICLR 2026; NeurIPS 2025 review scores: 4-4-4-2.

EMNLP 2025 (main)
PBI-Attack

PBI-Attack: Prior-Guided Bimodal Interactive Black-Box Jailbreak Attack for Toxicity Maximization [Paper] [Code]

Ruoxi Cheng, Yizhong Ding, Shuirong Cao, Ranjie Duan, Xiaoshuang Jia, Shaowei Yuan, Zhiqiang Wang, Xiaojun Jia

The 2025 Conference on Empirical Methods in Natural Language Processing (EMNLP 2025) Main Conference

ICMR 2025
Gibberish

Gibberish is All You Need for Membership Inference Detection in Contrastive Language-Audio Pretraining [Paper] [Code]

Ruoxi Cheng, Yizhong Ding, Shuirong Cao, Shitong Shao, Zhiqiang Wang

The 15th ACM International Conference on Multimedia Retrieval (ICMR 2025)

NeurIPS 2024 Workshop
RLDF

Reinforcement Learning from Multi-role Debates as Feedback for Bias Mitigation in LLMs [Paper] [Code]

Ruoxi Cheng, Haoxuan Ma, Shuirong Cao, Jiaqi Li, Aihua Pei, Zhiqiang Wang, Pengliang Ji, Haoyu Wang, Jiaqi Huo

Pluralistic Alignment Workshop at NeurIPS 2024

🎖 Honors and Awards

  • 2025, Grand Prize (Rank 1), National Cybersecurity Attack and Defense Software Competition
  • 2025, Second Prize (Top 5%), National Information Security Contest & “Great Wall Cup” Information Security Triathlon
  • 2025, Second Prize, China’s Innovation Challenge on Artifcial Intelligence Application Scene (CICAS 2025)
  • 2025, First‑Class Academic Scholarship at BESTI
  • 2025, Second Prize, National Software Innovation Competition — North China Region
  • 2024, Third Prize, “Huawei Cup” National Cybersecurity Innovation Competition
  • 2024, Outstanding Undergraduate Thesis (Top 3%)
  • 2021, Merit Student Award of Southeast University
  • 2021, Second Prize, National Undergraduate Mathematics Competition
  • 2018 & 2019, First Prize, Chinese Mathematical Olympiad — Jiangsu Province

📖 Educations

  • 2024.09 – Present, M.S., Beijing Electronic Science and Technology Institute, Beijing, China
  • 2020.09 – 2024.06, B.S., Chien-shiung Wu College, Southeast University, Nanjing, China

💻 Internships

Alibaba SecurityFeb 2025 – Present (Supervisor: Ranjie Duan)

  • Engaged in alignment training and evaluation of LLMs, contributing to the technical report:

    • Oyster-I: Beyond Refusal — Constructive Safety Alignment for Responsible Language Models (Alibaba AAIG) [paper]Principal Contributor (Fifth Author)
  • Co-inventor on two Alibaba Innovation Proposal patents:

    • User–Model Interactive Security Guidance Mechanism Based on Game TheoryFifth Inventor
    • A Method for Constructing Chinese–English Safety Evaluation Datasets Based on Inference Complexity GradingSixth Inventor

Guolian Securities, Information Technology HeadquartersJul 2023 – Sep 2023 (Supervisor: Honghui Xu)

  • Cleaned and structured financial data for knowledge-graph integration.
  • Trained and fine-tuned open-source LLMs on curated knowledge-base data.

💃 Skills & Interests

  • Languages: English – IELTS 7.0 (Listening 7.5, Reading 8.0, Writing 6.5, Speaking 6.0); CET-6: 585
  • Programming: Python, C++, PyTorch, TensorFlow, MySQL, Navicat, SPSS
  • Security: NISP Level-2 Certification, administered by China Information Security Evaluation Center
  • Chinese Dance: Level-10 Excellence; First Prize, Solo Dance Competition, Wuxi, Jiangsu Province (2015, 2018)

🌍 Social Practice

  • Jul 2021 – Aug 2021, Village Elementary School Teaching, Xiangyang, Hubei – Volunteer, National Education Support Project
  • Jun 2019 – Sep 2021, Talented Youth Initiative, Peking University – Vice Leader of Applied Science & Engineering Study Group
ClustrMaps Globe