Imitate Before Detect

Aligning Machine Stylistic Preference for Machine-Revised Text Detection

Jiaqi Chen1,9*, Xiaoye Zhu2,10*, Tianyang Liu5*, Ying Chen6, Xinhui Chen3,4,
Yiwen Yuan8, Chak Tou Leong8, Zuchao Li3†, Tang Long12, Lei Zhang5,
1Fudan University, 2South China University of Technology, 3Wuhan University, 4Fenz AI, 5UC San Diego,
6UIUC, 7CMU, 8PolyU, 9Stanford University, 10NUS (Chongqing) Research Institute,
11Georgia Tech, 12Independent researcher

*Equal contribution.

Equal contribution of corresponding author.

Abstract

Large Language Models (LLMs) have revolutionized text generation, making detecting machine-generated text increasingly challenging. Although past methods have achieved good performance on detecting pure machine-generated text, those detectors have poor performance on distinguishing machine-revised text (rewriting, expansion, and polishing), which can have only minor changes from its original human prompt. As the content of text may originate from human prompts, detecting machine-revised text often involves identifying distinctive machine styles, e.g., worded favored by LLMs. However, existing methods struggle to detect machine-style phrasing hidden within the content contributed by humans. We propose the “Imitate Before Detect” (ImBD) approach, which first imitates the machine-style token distribution, and then compares the distribution of the text to be tested with the machine-style distribution to determine whether the text has been machine-revised. To this end, we introduce style preference optimization (SPO), which aligns a scoring LLM model to the preference of text styles generated by machines. The aligned scoring model is then used to calculate the style-conditional probability curvature (Style-CPC), quantifying the log probability difference between the original and conditionally sampled texts for effective detection. We conduct extensive comparisons across various scenarios, encompassing text revisions by six LLMs, four distinct text domains, and three machine revision types. Compared to existing state-of-the-art methods, our method yields a 13% increase in AUC for detecting text revised by open-source LLMs, and improves performance by 5% and 19% for detecting GPT-3.5 and GPT-4o revised text, respectively. Notably, our method surpasses the commercially trained GPT-Zero with just 1,000 samples and five minutes of SPO, demonstrating its efficiency and effectiveness. Code, data, and models are available.

Why we need Imitate?

ImBD

(a-c) Comparative examples of human-written, machine-generated, and machine-revised text. (d) Fast-DetectGPT shows a significant drop in detection accuracy when identifying machine-revised text compared to machinegenerated text. (e) Our method brings a noticeable improvement in detecting machine-revised text compared to Fast-DetectGPT. “Fast-Det.” denotes “Fast-DetectGPT”.

How we Imitate?

ImBD

Imitating the stylistic preferences of LLMs. (a) Token distribution before and after machine-style imitation, demonstrating a deliberate fine-tuning of the scoring model to bias its token distribution towards a machine writing style (e.g., shifting preferences from common words like “explore” to machine-favored tokens such as “delve”). (b) The pipeline of Style Preference Optimization is applied to align the base scoring model with the style of machine-revised content using paired human-machine texts. This results in a machine-style scoring model, which generates token distributions $p(x_n|x_{0:n-1})$ for each position n, subsequently used for style-conditional probability curvature calculations.

Impact of Style-conditional probability curvatures (Style-CPC).

ImBD

(Left) Conditional probability curvatures (CPC) from Fast-DetectGPT (denoted as “Fast-Det.”) applied to purely machine-generated text; (Middle) Conditional probability curvatures applied to purely machine-revised text; (Right) Style-conditional probability curvatures from ours applied to machine-revised text. The greater the separation between human-written texts (red) and machine-revised texts (blue), the more effective the detection.

Leaderboard

Method Time cost
(s/1k words)
GPT-3.5 Avg. GPT-4o Avg.
XSUM Writing PubMed XSUM Writing PubMed
RoBERTa-base 0.07 0.5806 0.7225 0.4370 0.5800 0.4921 0.4774 0.2496 0.4064
RoBERTa-large 0.11 0.6391 0.7236 0.4848 0.6158 0.4782 0.4708 0.3089 0.4193
Likelihood 0.38 0.4982 0.8788 0.5528 0.6433 0.4396 0.8077 0.4596 0.5690
Entropy 0.35 0.6742 0.3021 0.5662 0.5142 0.6122 0.2802 0.5899 0.4941
LogRank 0.36 0.4711 0.8496 0.5597 0.6268 0.4002 0.7694 0.4472 0.5389
LRR 0.41 0.4016 0.7203 0.5629 0.5616 0.3095 0.6214 0.4710 0.4673
DNA-GPT 35.92 0.5338 0.8439 0.3333 0.5703 0.4974 0.7478 0.3151 0.5201
NPR 111.99 0.5659 0.8786 0.4246 0.6230 0.5065 0.8444 0.3740 0.5750
DetectGPT 111.33 0.6343 0.8793 0.5608 0.6915 0.6217 0.8771 0.5612 0.6867
Fast-Detect-GPT 0.72 0.7312 0.9304 0.7182 0.7933 0.6293 0.8324 0.6175 0.6931
ImBD (Ours) 0.72 0.9849 0.9871 0.8626 0.9449 0.9486 0.9468 0.7743 0.8899

Detection of GPT-3.5 and GPT-4o polished text. Typically, the Neo-2.7B (Black et al. 2021) is used as the source for the scoring model. NPR and DetectGPT, on the other hand, utilize T5-3B (Chen et al. 2019) for generating perturbations, whereas Fast-DetectGPT employs GPT-J (Wang and Komatsuzaki 2021) as a surrogate model to generate samples. Metric: AUROC.

Method Metrics
XSum Writing PubMed Avg.
GPTZero 0.9542 0.9711 0.8800 0.9351
ImBD (Ours) 0.9849 0.9871 0.8626 0.9449

Compared with GPTZero on detecting GPT-3.5 polished text. Metric: AUROC.

Performance on Diverse Tasks

Method Tasks Avg.
Rewrite Expand Polish Generate
Likelihood 0.4073 0.4564 0.6039 0.8939 0.5904
Entropy 0.5840 0.6629 0.5431 0.4129 0.5507
LogRank 0.3868 0.4273 0.5864 0.8925 0.5732
LRR 0.3488 0.3581 0.5183 0.8541 0.5198
DNA-GPT 0.4101 0.4901 0.5847 0.8931 0.5945
NPR 0.3606 0.5139 0.5673 0.8541 0.5740
DetectGPT 0.4060 0.6000 0.6615 0.8985 0.6415
Fast-DetectGPT 0.4499 0.7159 0.7989 0.9706 0.7338
ImBD (Ours) 0.8739 0.9758 0.9707 0.9996 0.9550

We evaluated the detection performance, measured by average AUROC, of text revised by leading LLMs (Qwen2-7B, Llama-3-8B, Mixtral-7B, Deepseek-7B, GPT-3.5, and GPT-4o) on the XSum dataset.

Extra Experiment on Text Length

ImBD

Evaluations of detection accuracy for XSum polished texts trimmed to the specified word count.

More Visualization


ImBD

ROC curves in log scale evaluated on polish task of XSum dataset, where the dash lines denote the random classifier. “Fast-Det.” denotes “Fast-DetectGPT”.

BibTeX

@misc{chen2024imitatedetectaligningmachine,
      title={Imitate Before Detect: Aligning Machine Stylistic Preference for Machine-Revised Text Detection}, 
      author={Jiaqi Chen and Xiaoye Zhu and Tianyang Liu and Ying Chen and Xinhui Chen and Yiwen Yuan and Chak Tou Leong and Zuchao Li and Tang Long and Lei Zhang and Chenyu Yan and Guanghao Mei and Jie Zhang and Lefei Zhang},
      year={2024},
      eprint={2412.10432},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2412.10432}, 
}