PDF accessibility – a practical guide (Part 1)
‘How to make PDFs accessible’ is one of our most often-asked questions. In this 2-part guide, we’ll give you some practical tips.
What we’ll cover in part 1 (this post)
- A little bit about how PDFs work
- How PDFs are experienced by people with a disability
We’ll discuss what PDFs are, why they were designed, and their limitations.
We’ll talk about how scanning, scrolling, and understanding context are far less easy for someone who is, say, blind.
What we’ll cover in part 2 (the next post)
- What the law requires
- How to actually make a PDF accessible
We’ll touch upon the legal requirements of the EU Accessibility Directive, but the principles apply globally.
Finally, in terms of actually fixing PDFs, we’ll take two approaches. Firstly, we’ll talk about amending your source document, and secondly, we’ll discuss working on existing PDFs for which you have no source.
A little bit about PDFs
Let’s start by talking about what the Portable Document Format is, why it was designed, and why making a PDF accessible presents a considerable challenge.
I’m not going to give an extensive history of the PDF (Wikipedia is your friend) but suffice it to say, they’ve been around for a long time (since 1993 in fact). This predates any modern accessibility standard by about 20 years
The PDF was invented essentially to create a new document format that would look the same wherever you opened it. It could be transmitted or printed without any changes to the original design. This was actually fairly groundbreaking back in the early 1990s.
Main differences between HTML and PDF
What a PDF does really well is describe the appearance of a document.
However, what it absolutely doesn’t do is anything else. Now, it was some time after PDFs were invented that they realized that extra information was going to be needed if PDFs were to be made accessible.
So on top of appearance, they added meaning, basically by just tacking it on top.
Now, in contrast, HTML works in the opposite way. It first adds meaning.
And then adds appearance later.
This may seem like an academic distinction but the consequences for your content team are significant.
So what does this mean for you?
PDFs are easy to make look consistent, but they are far harder to make accessible. By default. the information you need to make them accessible is simply not there.
HTML in comparison is easier to make accessible but tends to vary in appearance.
What are PDFs good for? Absolutely nothing?
So as I mentioned earlier, PDFs are a way to create information on one computer and send it to another, and the recipient will get the exact same information. We actually take this for granted these days but this really was a big thing at the time.
PDFs are also permanent. If you tried to open a web page from 1998 on a modern computer, the chances are it would either not work at all or look different from how it was originally (terribly) designed.
But with a PDF, it will look the same now as it did 20 years ago.
You can probably rely on all your PDFs working for another 50 years in the future.
They’re also great for printed media (things like tax forms). You just want something that you can print and write on that’s always going to be the same.
So in summary, PDFs are great for:
- Printed media
Unfortunately, these same qualities make them utterly awful for:
- Mobile experience
As we’ll see in part 2, fixing PDFs is mainly carried out through patches and hacks. If you’re creating them from another source, like a Word file, you also have the challenge of maintaining that file as well.
If you’ve ever used a PDF on a phone (and of course you have), then you already know what a terrible experience it is for everyone.
The Government Digital Service here in the UK, which builds all the gov.uk websites using strict accessibility standards, suggests that you shouldn’t use PDFs at all. Instead, you should create all content as HTML.
Experiencing PDFs with a disability
PDFs are almost a nice way to get familiar with the basics of accessibility because they highlight a tonne of things that we just take for granted.
Imagine you’re an average person looking at a PDF. You’ll take for granted the fact that you can scan through the page headings, scroll using your mouse wheel, and understand the context of the information by looking at supporting images.
You can pretty much glance at the PDF and get a fairly quick understanding of what it’s about without too much thought.
But this isn’t the case for everyone. Not everyone can just scroll through a document. Some people might be using a screen reader that reads your PDF word-for-word. It might allow you to skip forward and backward through sentences, but it’s the same as listening to a podcast.
The core concept here is that you’ll only get a slice of information at once without the context of all the information.
The two main problems with PDFs
For a screen reader user, there are two fundamental issues that arise:
- Order and context of page content are not guaranteed
- Navigating large documents takes forever
Order and context not guaranteed
Screen readers will not read out the content of a PDF in order unless you explicitly tell them what that order should be.
This is because of the way PDFs are created. You’ll add text blocks to your document in, say, Adobe Acrobat, and they may get read out in the order you add them. Screen readers will use the ID of the element rather than where it appears on the page.
The solution is to set a logical order.
“Meaningful sequence” is actually part of WCAG and is a Level A requirement, so it’s important that you consider it when creating PDFs.
Navigating through PDFs takes forever
Because screen readers can’t see the whole PDF at once, and only deliver a slice of it at a time, the user won’t know:
- how big the PDF is
- what else is on the PDF
The only way to find out is by quite literally listening to every word. This gets very tiring very quickly. It’s bad enough for a single page but imagine trying to navigate through an instruction manual or a 200-page novel.
The solution is to implement semantically correct headings, just as you would for a web page.
This is the process of adding heading tags and nested headings within them. It lets screen reader users get a list of, say, chapters, and skip to the one they want.
In code, headings look like this:
<h1>Headings and Subheadings</h1> <h2>Purpose of Headings</h2> <h3>Heading Levels</h3> <h2>Meaning vs. Formatting</h2>
If you’re in charge of creating PDFs or making PDFs accessible, then you’ve probably already come across some or all of the problems I’ve just outlined.
So, how do you fix them?
In an ideal world, you’ll find the source document you created the PDF from and edit that. In a less-than-ideal world, you’ll be editing PDFs for which you don’t have the source.
Part 2 of this two-part series delves into the practicalities of making PDFs accessible.