Abstract
Large Language Models (LLMs) have revolutionized text generation,
making detecting machine-generated text increasingly
challenging. Although past methods have achieved good performance
on detecting pure machine-generated text, those
detectors have poor performance on distinguishing machine-revised
text (rewriting, expansion, and polishing), which can
have only minor changes from its original human prompt. As
the content of text may originate from human prompts, detecting
machine-revised text often involves identifying distinctive
machine styles, e.g., worded favored by LLMs. However, existing
methods struggle to detect machine-style phrasing hidden
within the content contributed by humans. We propose the
“Imitate Before Detect” (ImBD) approach, which first imitates
the machine-style token distribution, and then compares the
distribution of the text to be tested with the machine-style
distribution to determine whether the text has been machine-revised.
To this end, we introduce style preference optimization
(SPO), which aligns a scoring LLM model to the preference
of text styles generated by machines. The aligned scoring
model is then used to calculate the style-conditional probability
curvature (Style-CPC), quantifying the log probability
difference between the original and conditionally sampled
texts for effective detection. We conduct extensive comparisons
across various scenarios, encompassing text revisions
by six LLMs, four distinct text domains, and three machine
revision types. Compared to existing state-of-the-art methods,
our method yields a 13% increase in AUC for detecting text
revised by open-source LLMs, and improves performance by
5% and 19% for detecting GPT-3.5 and GPT-4o revised text,
respectively. Notably, our method surpasses the commercially
trained GPT-Zero with just 1,000 samples and five minutes
of SPO, demonstrating its efficiency and effectiveness. Code,
data, and models are available.