Whose Punctuation Is More Human: Yours or A.I.'S? (www.nytimes.com)

0 points 1 day ago ago | visit original

🤖 AI Summary

A widely shared observation earlier this year — that ChatGPT overuses the em dash — sparked a debate about whether certain punctuation marks can reveal AI-generated text. The real takeaway isn’t simply “dashes = robot”; it’s that the signal is largely orthographic. ChatGPT often emits the typographically correct em dash (—) with no spaces, reflecting the print-style punctuation common in books and edited prose. Ordinary typists typically use hyphens (-), double hyphens (--), or spaced dashes, so the model’s output looks conspicuously “bookish” to readers and triggered claims that humans don’t use dashes at all. For the AI/ML community this has three practical implications. First, surface cues like specific punctuation are brittle detectors: they arise from training-data distribution (large volumes of printed text) and tokenization/normalization choices, not an immutable marker of synthetic authorship. Second, as everyday writing shifts toward quick, typed speech, the line between “oral” web text and “written-writing” changes, so models will keep reflecting whatever corpora they ingest. Third, engineers should treat orthographic artifacts as transient—addressable by data curation, post-processing, or fine-tuning—but not reliable provenance signals. In short, punctuation quirks can be useful heuristics but are neither definitive nor stable for long-term AI detection or evaluation.

Loading comments...

loading comments...