Imitate Before Detect: Aligning Machine Stylistic Preference for Machine-Revised Text Detection

Aligning Machine Stylistic Preference for Machine-Revised Text Detection

¹Fudan University, ²South China University of Technology, ³Wuhan University, ⁴Fenz AI, ⁵UC San Diego,

⁶UIUC, ⁷CMU, ⁸PolyU, ⁹Stanford University, ¹⁰NUS (Chongqing) Research Institute,

¹¹Georgia Tech, ¹²Independent researcher

*Equal contribution.

†Equal contribution of corresponding author.

Abstract

Large Language Models (LLMs) have revolutionized text generation, making detecting machine-generated text increasingly challenging. Although past methods have achieved good performance on detecting pure machine-generated text, those detectors have poor performance on distinguishing machine-revised text (rewriting, expansion, and polishing), which can have only minor changes from its original human prompt. As the content of text may originate from human prompts, detecting machine-revised text often involves identifying distinctive machine styles, e.g., worded favored by LLMs. However, existing methods struggle to detect machine-style phrasing hidden within the content contributed by humans. We propose the “Imitate Before Detect” (ImBD) approach, which first imitates the machine-style token distribution, and then compares the distribution of the text to be tested with the machine-style distribution to determine whether the text has been machine-revised. To this end, we introduce style preference optimization (SPO), which aligns a scoring LLM model to the preference of text styles generated by machines. The aligned scoring model is then used to calculate the style-conditional probability curvature (Style-CPC), quantifying the log probability difference between the original and conditionally sampled texts for effective detection. We conduct extensive comparisons across various scenarios, encompassing text revisions by six LLMs, four distinct text domains, and three machine revision types. Compared to existing state-of-the-art methods, our method yields a 13% increase in AUC for detecting text revised by open-source LLMs, and improves performance by 5% and 19% for detecting GPT-3.5 and GPT-4o revised text, respectively. Notably, our method surpasses the commercially trained GPT-Zero with just 1,000 samples and five minutes of SPO, demonstrating its efficiency and effectiveness. Code, data, and models are available.

Leaderboard

Method	Time cost (s/1k words)	GPT-3.5			Avg.	GPT-4o			Avg.
Method	Time cost (s/1k words)	XSUM	Writing	PubMed	Avg.	XSUM	Writing	PubMed	Avg.
RoBERTa-base	0.07	0.5806	0.7225	0.4370	0.5800	0.4921	0.4774	0.2496	0.4064
RoBERTa-large	0.11	0.6391	0.7236	0.4848	0.6158	0.4782	0.4708	0.3089	0.4193
Likelihood	0.38	0.4982	0.8788	0.5528	0.6433	0.4396	0.8077	0.4596	0.5690
Entropy	0.35	0.6742	0.3021	0.5662	0.5142	0.6122	0.2802	0.5899	0.4941
LogRank	0.36	0.4711	0.8496	0.5597	0.6268	0.4002	0.7694	0.4472	0.5389
LRR	0.41	0.4016	0.7203	0.5629	0.5616	0.3095	0.6214	0.4710	0.4673
DNA-GPT	35.92	0.5338	0.8439	0.3333	0.5703	0.4974	0.7478	0.3151	0.5201
NPR	111.99	0.5659	0.8786	0.4246	0.6230	0.5065	0.8444	0.3740	0.5750
DetectGPT	111.33	0.6343	0.8793	0.5608	0.6915	0.6217	0.8771	0.5612	0.6867
Fast-Detect-GPT	0.72	0.7312	0.9304	0.7182	0.7933	0.6293	0.8324	0.6175	0.6931
ImBD (Ours)	0.72	0.9849	0.9871	0.8626	0.9449	0.9486	0.9468	0.7743	0.8899

Method	Metrics
GPTZero	0.9542	0.9711	0.8800	0.9351
ImBD (Ours)	0.9849	0.9871	0.8626	0.9449

Method

Metrics

XSum

Writing

PubMed

Avg.

GPTZero

0.9542

0.9711

0.8800

0.9351

ImBD (Ours)

0.9849

0.9871

0.8626

0.9449

Performance on Diverse Tasks

Method	Tasks				Avg.
Method	Rewrite	Expand	Polish	Generate	Avg.
Likelihood	0.4073	0.4564	0.6039	0.8939	0.5904
Entropy	0.5840	0.6629	0.5431	0.4129	0.5507
LogRank	0.3868	0.4273	0.5864	0.8925	0.5732
LRR	0.3488	0.3581	0.5183	0.8541	0.5198
DNA-GPT	0.4101	0.4901	0.5847	0.8931	0.5945
NPR	0.3606	0.5139	0.5673	0.8541	0.5740
DetectGPT	0.4060	0.6000	0.6615	0.8985	0.6415
Fast-DetectGPT	0.4499	0.7159	0.7989	0.9706	0.7338
ImBD (Ours)	0.8739	0.9758	0.9707	0.9996	0.9550

BibTeX

@misc{chen2024imitatedetectaligningmachine, title={Imitate Before Detect: Aligning Machine Stylistic Preference for Machine-Revised Text Detection}, author={Jiaqi Chen and Xiaoye Zhu and Tianyang Liu and Ying Chen and Xinhui Chen and Yiwen Yuan and Chak Tou Leong and Zuchao Li and Tang Long and Lei Zhang and Chenyu Yan and Guanghao Mei and Jie Zhang and Lefei Zhang}, year={2024}, eprint={2412.10432}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2412.10432}, }

Imitate Before Detect

Aligning Machine Stylistic Preference for Machine-Revised Text Detection

Abstract

Why we need Imitate?

How we Imitate?

Impact of Style-conditional probability curvatures (Style-CPC).

Leaderboard

Compared with GPTZero on detecting GPT-3.5 polished text. Metric: AUROC.

Performance on Diverse Tasks

We evaluated the detection performance, measured by average AUROC, of text revised by leading LLMs (Qwen2-7B, Llama-3-8B, Mixtral-7B, Deepseek-7B, GPT-3.5, and GPT-4o) on the XSum dataset.

Extra Experiment on Text Length

Evaluations of detection accuracy for XSum polished texts trimmed to the specified word count.

More Visualization

ROC curves in log scale evaluated on polish task of XSum dataset, where the dash lines denote the random classifier. “Fast-Det.” denotes “Fast-DetectGPT”.

Additional performance comparison on detecting machine-polished text. Target LLM: GPT-3.5.

BibTeX