The ATS Autopsy: what breaks when a machine reads your resume
Share
Yes, a resume's layout changes how much of it a machine can read, and the gap is wide. In a test of 48 synthetic resumes read by three local open-source text extractors (pdfplumber, pdfminer.six, python-docx) in June 2026, a clean single-column resume lost 0 percent of its fields, while a resume whose contact block was built as an image lost 37.5 percent and dropped the name, email, and phone on every read. Treat that as a floor on what can go wrong in the reading step, not a verdict on any named ATS.
The file type matters far less than the layout. In that same June 2026 test, PDFs lost 19.9 percent of field reads and Word files lost 0 percent, but almost all of the PDF loss came from a few designed layouts, not from the format itself. Layout is the lever. Fix the handful of layouts that fight the reader, and the rest of your resume can look however you want.
What did we actually test?
We tested one narrow thing: whether a resume's design changes how much of it a machine can read. Most applicant tracking systems and resume parsers start the same way, by pulling the raw text out of your file, then mapping that text to fields like name, email, and work history. If the text step drops or scrambles content, every step after it inherits the damage. This study isolates that first step on controlled inputs.
We built 48 synthetic resumes from three fabricated personas, one in software, one in nursing, one in warehouse operations. No real people and no personal data: the emails use reserved example.com and example.org addresses and the phones use the reserved 555 exchange, so nothing resolves to a real inbox. Each persona carries identical content across nine layouts, saved in both PDF and DOCX, so any difference in the read traces to the layout and the format, not the words. We then read every file with three local open-source text extractors: pdfplumber 0.11.4, pdfminer.six 20231228, and python-docx 1.1.2. That is 75 resume-by-tool reads, each scored across eight fields, for 600 scored field reads. Nothing left the machine, and no commercial ATS was contacted at any point.
Two honest notes about what these numbers are. First, they are deterministic engineering measurements, not sampled estimates, so a clean checkout reproduces the exact same figures and there are no sampling error bars to report. Second, one open-source resume parser failed to build on the test machine, so the field-mapping stage here is a transparent reproduction of a parser's field step. It searches the extracted text for each field the way a parser's field stage does, and the raw extracted text is saved for every read, so anyone can re-score it independently.
These three extractors are the same kind of text step that commercial systems run first, so read every number below as a floor on what can go wrong in the reading step, not a verdict on any named product. We did not test Workday, Greenhouse, Taleo, or any other named ATS, and nothing here claims a system rejects or discriminates against a candidate.
What breaks when a machine reads your resume?
Start with the whole picture. Across every layout and field in the June 2026 test of 48 synthetic resumes read by the three local open-source extractors, 86 of 600 field reads failed, a 14.3 percent loss rate. The loss is not spread evenly. A plain resume reads almost perfectly, and the damage piles up in a few designs that fight the reader. Read the whole set as a floor on the reading step, not a verdict on any named ATS.
The clean baseline holds up. A single-column resume with standard headings lost 0 percent of its 72 field reads in that same June 2026 test. Renaming a section from "Experience" to "Where I've Worked" cost nothing either, because the reader lifts the content underneath the label, not the label itself. If your resume is one plain column, the reading step is not your problem, and no amount of keyword fiddling changes that.
The worst result was text rendered as a picture. In the June 2026 test, a resume whose name and contact block were saved as an image lost 37.5 percent of its scored fields, the highest of any layout, and it dropped the name, email, and phone on 100 percent of reads, because a picture of text leaves nothing for a text extractor to read. One caveat matters here. These extractors do not run OCR, and some commercial systems do, so treat the image result as a worst case rather than a certainty. Everything else on that resume, the work history and skills that stayed real text, came through fine. The catastrophe is narrow and total. It is only the parts you turned into a picture.
Parking your contact details in the page header or footer is the quieter version of the same mistake. In the June 2026 test, resumes with contact details in the header band and resumes with them in the footer band each lost 25 percent of fields overall, and the email specifically dropped on 66.7 percent of reads from each, because the extractor often reads the body of the page and skips the header and footer regions. These are text, not images, so a system that does capture those bands would do better. Still, on these local open-source extractors, the header or footer band was captured on only 3 of 9 reads. Framed as a floor, if your only email address lives in a header, you are betting it survives, and it often does not.
An unusual or embedded display font lost 20.8 percent of fields in the same June 2026 test, when the font's glyphs extracted as garbage characters instead of letters. The name and contact details survived, and the body sections underneath the display font took the loss. This one is a coin flip between the two PDF extractors, which is what makes it dangerous. It can look perfect on your screen and read as noise to the machine. Again, a floor on local open-source extractors, not a verdict on any named ATS.
Side-by-side columns scramble the reading order. A three-column resume lost 14.6 percent of fields in the June 2026 test, and the failure landed on the most recent job, which dropped on 33 percent of reads as the columns interleaved and the employer ended up separated from the title. A two-column resume came through nearly clean at 1.4 percent, so two columns are usually survivable and three are where the reading order tends to break.
Work history laid out in a bordered table lost 12.5 percent of fields in that same June 2026 test, when the table cells collapsed together and the tokens for a single role no longer sat next to each other. The tables linearized cleanly on 6 of 9 reads, so a table is a gamble, not a guaranteed loss. All of this stays a floor from local open-source extractors, not a verdict on any product.
Now the rule everyone repeats, and the way this test breaks it. In the June 2026 test, PDFs lost 19.9 percent of field reads and Word files lost 0 percent. Read fast, that says never send a PDF. Read honestly, it says the opposite. Almost all of the PDF loss came from the hard layouts above, and a clean single-column text-based PDF lost nothing. The old "always PDF" and "always Word" rules both miss the real failure mode, which is a resume saved as an image instead of as text. The format war is a distraction. The picture is the catastrophe.
ATS Autopsy, June 2026 Total field loss: 14.3 percent (86 of 600 field reads). Clean single-column: 0 percent lost. Contact block built as an image: 37.5 percent lost, with name, email, and phone gone on every read. By file type: PDF 19.9 percent, Word (DOCX) 0 percent. Basis: 48 synthetic resumes read by three local open-source text extractors (pdfplumber, pdfminer.six, python-docx). These extractors do not run OCR, and some commercial systems do, so the image figure is a worst case. Read the whole card as a floor on what can go wrong in the reading step, not a verdict on any named ATS.
How do you check your own resume by hand?
You can run the same check the machine runs, in about ten minutes, with tools you already have.
The parse test. Copy your whole resume and paste it into the plainest text editor you own, the one with no formatting at all. Read what lands. If your name, email, phone, and every job title come through in the right order, a parser will manage the same. If the contact line vanishes, or two side-by-side columns interleave so a job title and its employer end up separated by a stretch of unrelated text, you have found a layout problem the reader hits before any human sees you. In this study, a field only passed when the right tokens stayed within a short window of each other, 40 characters for a name and 60 for a job title next to its employer, because that adjacency is what a downstream field-mapper needs and what a column scramble destroys.
The picture test. Open your resume and try to select your name and contact details with your cursor, the way you would to copy them. If the text highlights, it is real text and a parser can read it. If your name is part of an image and will not highlight, it is invisible to a text extractor, which is the single worst thing you can do to a resume, the one that produced the 100 percent contact-loss result above.
The header test. Check whether your email and phone sit in the page header or footer rather than in the body of the page. Move them into the body. On the local open-source extractors in this test, the header or footer band was captured on only 3 of 9 reads, so contact details up there are a coin flip.
The worksheet, run in order. One column beats three. Keep a standard section order, though the exact heading words do not matter. Put contact details in the body of the page, never in a header, footer, or image. Save as a text-based PDF or a Word file, never as a scan or an exported picture. Run those four passes and you have removed almost every failure this study found.
What can't this test tell you?
This is a measurement of the reading step, and only that. It does not tell you whether a specific company's ATS accepts your resume, because we did not test any named product, and real systems layer their own parsing and matching on top of a text step like this one. The image result in particular is a worst case, because these extractors do not run OCR and some commercial systems do, so a name saved as a picture might survive a system that ours could not. And because the field-mapping here is a transparent reproduction of a parser's field step rather than a full commercial parser, treat the layout rankings as directional. The order of danger is solid. The exact percentage a given vendor would produce is not something this test claims to know.
The manual checks above have a ceiling too. They tell you whether your resume reads cleanly in general. They cannot tell you how your specific file scores against a specific job posting, which fields a real parser mapped where, or how relevant your text looks next to the must-haves the recruiter actually searched for. Doing the paste test by hand also gets tedious by the third time you tailor a resume in a week.
You can run every check above by hand, and you should at least once, because it teaches you exactly what the reader is blind to. When you want the two-minute version that flags which fields survived the read and which dropped, and scores your text against a specific posting, run your file through UnchartedCareer's free ATS resume scan. Same logic as this study, faster.
Five stats you can quote
- In a June 2026 test of 48 synthetic resumes read by three local open-source text extractors (pdfplumber, pdfminer.six, python-docx), 86 of 600 field reads failed, a 14.3 percent overall loss rate, a floor on what can go wrong in the reading step and not a verdict on any named ATS.
- In that same June 2026 test, single-column resumes lost 0 percent of 72 field reads while resumes with text saved as an image lost 37.5 percent, the highest of any layout; these extractors do not run OCR and some commercial systems do, so treat the image figure as a worst case.
- In that test, a contact block rendered as an image dropped the candidate's name, email, and phone on 100 percent of reads, because a picture of text leaves nothing for a text extractor to read; the extractors ran no OCR, so this is a worst case, not a verdict on any product.
- Across every layout in the June 2026 test of 48 synthetic resumes on local open-source extractors, PDFs lost 19.9 percent of field reads and Word (DOCX) files lost 0 percent, with nearly all of the PDF loss coming from a few designed layouts rather than the format itself.
- In that test, contact details placed in a page header or footer dropped the email address on 66.7 percent of reads, and a three-column layout dropped the most recent job on 33 percent of reads, both measured on local open-source extractors as a floor on the reading step, not a verdict on any named ATS.
The full dataset is open. Every scored read is in results.csv and the structured roll-up is in results.json, published under a Creative Commons Attribution 4.0 International license (CC BY 4.0), so you can re-score or reuse the numbers with attribution.
How did we run this, and where can you get the data?
The ATS Autopsy is our own original test, run on 2026-06-24 and framed as tested in June 2026. We generated 48 synthetic resumes from three fabricated personas across nine layouts, saved each as PDF and DOCX, and read every file with three local open-source extractors: pdfplumber 0.11.4, pdfminer.six 20231228, and python-docx 1.1.2, with PDFs rendered through a headless Chromium shell. That produced 600 scored field reads across eight fields (name, email, phone, three jobs, education, skills), where a field passes only when its content is recoverable in the right order from the extracted text. The generator and scorer are deterministic, so the same checkout reproduces these exact numbers, which is why we report no sampling error: these are engineering measurements, not estimates.
We did not test any named commercial ATS, and we make no claim that any product rejects or discriminates against a candidate. These extractors are the same kind of text step that commercial systems run first, so the numbers are a floor on what can go wrong in the reading step. The image-loss figures assume no OCR, and some commercial systems run OCR, so those are a worst case. The field-mapping stage is a transparent reproduction of a parser's field step, and the raw extracted text for every read is saved so anyone can re-score it.
The dataset is published under a Creative Commons Attribution 4.0 International license (CC BY 4.0). Per-read results are in results.csv, the structured roll-up is in results.json, and the raw extracted text for each read is available for anyone who wants to check the scoring independently.