Accessibility: Images, “Alt” tags, and the “Out Loud” Experience

Most of us take for granted the rather quiet, personal experience of browsing the web. Our eyes scan an entire page in milliseconds, drawn to certain headlines or images and skipping over others that we deem to be “noise.” For about 8 million Americans, though, their experience of the web is vastly different. It’s loud, it’s brash, and when web pages are not formatted correctly, it’s confusing or just plain broken.

A good developer will know that well-formed HTML markup means providing clues to users with different abilities to ensure that all content and context is communicated. HTML is very good at that. The “alt” tag on an image is a great example. An image can be seen by sighted users, and the “alt” tag is meant to give those that can not see the image a description of what it is communicating. It’s also useful when the image does not load at all — the “alt” text commonly displays when the image is broken — and might help a sighted user understand what they are missing.

But how many of us understand what happens when we forget to add meaningful “alt” text? How many of us have heard assistive technology read a page out loud? How many of us understand what a broken experience is like for a non-sighted user?

Oomph takes accessibility very seriously. This means that we need to train website admins to understand why something like “alt” tags are important for non-sighted users. Data and real-world examples are the best way to do that. Therefore, we created a few scenarios and tested them with the top three screen readers by market share in order to illustrate what happens when HTML is not formatted correctly, and what a difference it makes when it is done properly.


With our examples, we hope to let everyone know — clients and developers alike — the importance of something as fundamental as image “alt” text.


In this article we will explore:

  • What is the “out loud” experience like with some of the more common assistive technologies?
  • What happens when an image does not have a well-formed “alt” tag?
  • When is it acceptable to supply an empty “alt” tag?
  • What happens when an image is inside an anchor (link), and what does it sound like?
  • How descriptive does your “alt” text need to be?

Ready? Let’s get going.

The “Out Loud” Experience

For daily users of screen readers, the voice of their software and the preferences they have in place increase their comfort level. These might include the type of voice, the speed of the voice, and the shortcuts that might be in place to navigate the page. It might be a surprise that most users have their screen readers set to a fast speed, almost like speed reading. A faster voice means that they can navigate and understand the page faster. It might seem jarring to someone who uses a screen reader only rarely, but a daily user can get used to the fast pace easily. Here are some video examples from YouTube (not captioned for non-sighted users, but rather, videos to show sighted users what a non-sighted user experiences on the web).

A video sample of a user and their JAWS (Job Access With Speech) screen reader navigating the Cheesecake Factory site.

This is a great example of someone using a site with a screen reader and encountering an interactive element on a page that presents them with difficulty. In this example it is a carousel on the homepage. The user also slows down their screen reader so that we, the viewer, can understand it.

An example video uses the open source NVDA (NonVisual Desktop Access) screen reader on a webpage.

This example is a promotion for the free NVDA screen reader. It’s a good example of what someone will hear while reviewing a simple web page. This is a short one, with no human user talking but simply navigating the reader from one element to the next with the keyboard.

The difficulties & inconsistencies with screen readers

What is a bit harder to manage and get used to is the way in which words on a page are read out loud. There can be syntactical errors that may lead to confusion, like homophones (words that could sound like another word, or different words, depending on context, like “content” or “red/read”). Punctuation and HTML elements being read out loud take a little getting used to as well. JAWS, for example, is quite literal in its interpretation of written language and HTML elements. Listening to it is a very robotic experience, with punctuation read out loud quickly and with no pause.

But, of course, because this is the world of software, different screen readers read differently from each other and differently from one browser to another. JAWS and NVDA, two popular screen readers for Windows, announce bulleted lists as described below. VoiceOver for Apple products does not. All three announce the heading level of important text on the page, but when it comes to images and “alt” tags, they differ again.

  • A book title like “Breakfast with Socrates: An Extraordinary (Philosophical) Journey” is read as Breakfast with Socrates colon An Extraordinary open paren Philosophical close paren Journey for JAWS and NVDA users.
  • Heading levels are read out loud, followed by the content of the heading. A primary heading with the content “An Article About Headings” will be read as Heading level 1 An Article About Headings.
  • When the software encounters a bulleted list on a page it will start the list by saying List with 8 items and then read each bullet out loud by saying bullet and then the content of the list item. It will then end the list by saying list end.
  • A term like “counter-clockwise” will be read as counter hyphen clockwise unless the user has set preferences to ignore compound words.
  • Some text symbols are not read out loud consistently. NVDA does not say asterisk when encountered on a form to denote a required field, for example.
  • Math operations are not announced properly by default. JAWS reads the plus symbol correctly, but not the minus symbol, saying 2 dash 1 when it should be “2 minus 1”. JAWS also reads the  symbol as equals when it should be “greater than or equals to”.
  • VoiceOver doesn’t announce quotation marks, parentheses, or dashes. It does pause briefly, but users won’t know the difference between a dash, parentheses, or a quotation because all pauses sound the same. The different semantic meanings of these text symbols are lost to users.

What happens when an Image does not have any “alt” text?

We looked at three different screen readers — JAWS, NVDA, and VoiceOver. According to WebAIM’s 2017 survey, JAWS is used by almost half of all people that use screen readers (46.6%). NVDA follows behind that (31.9%), and VoiceOver behind that (11.7%). Other systems have less than 3% usage in the population. It should be noted that JAWS is a paid product for Windows, while NVDA is free and open source for Windows, and VoiceOver is bundled with all Apple devices.

When an image on a webpage does not have an “alt” description, or has an incomplete alt, these three software products handle it in different ways. We tested a few variations of empty “alt”s as well, just to document what would happen. The results for image tags are:

JAWSNVDAVoiceOver
1a. Not Ideal: No alt <img src="oomph.png">Blank(nothing)Oomph dot p n g image
2a. Ideal: Empty alt <img src="oomph.png" alt="">Blank(nothing)(nothing) or New line
3a. Not Ideal: Non-empty but meaningless alt <img src="oomph.png" alt=" "> (DON’T DO THIS)BlankGraphicNew line
4a. Not Ideal: Non-empty but meaningless alt <img src="oomph.png" alt="&nbsp;"> (DON’T DO THIS)GraphicGraphicImage
5a. Most Ideal: Non-empty but meaningful alt <img src="oomph.png" alt="Oomph logo">Graphic Oomph logoGraphic Oomph logoOomph logo Image

The three major players disagree on how to handle images with empty or meaningless “alt” descriptions. NVDA does what we want it to do in cases where an “alt” is not supplied or an empty one is supplied, but JAWS feels like it needs to announce the fact that something might be there. It likes to say Blank. This could lead a user to wonder if they are missing something that is not accessible to them.

VoiceOver does something more expected — or at least, it acts like we have been told how screen readers should act. It reads out the file name when the “alt” tag is not present. It “skips” an image with an empty “alt” tag by saying new line, which is what we would want it to do if we purposefully leave the “alt” tag blank.

When is it acceptable to supply an empty “alt” tag?

In some cases it is acceptable and preferred to have an “alt” tag present but blank. The “alt” tag is used when the image is important and contributes to the understanding of the page content or actions. In cases where the image is purely decorative, it is preferred to use the “alt” tag but leave it blank. If the image is an icon and there is already descriptive text associated with it, or if the image does not contribute to the story that that page is telling, leave the “alt” tag blank. In other words, if a non-visual user would not miss the image, don’t describe it.


In some cases it is acceptable and preferred to have an “alt” tag present but blank. In other words, if a non-visual user would not miss the image, don’t describe it.


But the “alt” tag should still be present, otherwise, something more confusing might happen: the file name of the image might get read out loud, as VoiceOver does. This might be more harmful to the understanding of the page. Consider these examples and how they get read out loud:

JAWSNVDAVoiceOver
1b. Not Ideal: Image and no alt <img src="mobile-phone-icon.svg">Contact UsContact usContact usmobile hyphen phone hyphen icon dot s v g image Contact us
2b. More Ideal: Image and empty alt <img src="mobile-phone-icon.svg" alt="">Contact UsContact usContact usContact us
3b. Most Ideal: Image and empty alt with aria hidden <img src="mobile-phone-icon.svg" alt="" aria-hidden="true">Contact UsContact usContact usContact us

While the two more popular screen readers agree that an empty “alt” and no “alt” are the same, the built-in screen reader for Apple products announces the file name of the image when there is no “alt” tag present. This experience should be avoided.

3b is the most ideal because, in addition to the empty “alt” tag that is semantically correct, it goes a step further to try to get screen readers to ignore this element with the addition of the aria-hidden attribute. VoiceOver ignores this attribute, but the effect is the same whether or not it is there. It just goes the extra mile to ensure that screenreaders don’t try to announce content that is not meaningful to a non-sighted user.

What happens when an image is inside an anchor (link), and what does that sound like?

A common scenario is an image surrounded by a link — for example, the image of an article is used as a link to that article, along with the headline and maybe an excerpt. There are a number of different ways to approach the HTML markup of scenarios like this.

How do screen readers announce this markup and what does it sound like?

JAWSNVDAVoiceOver
1c. Not Ideal: Link and Image no alt <a href="//google.com"><img src="oomph.png"></a>Link graphic oomphLink graphic Google dot comLink oomph dot p n g image
2c. Less Ideal: Link and Image with empty alt <a href="//google.com"><img src="oomph.png" alt=""></a>Link graphic OomphLink graphic Google dot comLink h t t p slash slash google dot com slash
3c. More Ideal: Link and Image with meaningful alt <a href="//google.com"><img src="oomph.png" alt="Oomph logo"></a>Link graphic Oomph logoLink graphic Oomph logoLink image Oomph logo image
4c. More Ideal: Link with title, image no alt <a href="//google.com" title="Navigate to Google"><img src="oomph.png"></a>Link Navigate to GoogleLink Navigate to GoogleLink Navigate to Google
5c. Most Ideal: Link with title, image with empty alt <a href="//google.com" title="Navigate to Google"><img src="oomph.png" alt=""></a>Link Navigate to GoogleLink Navigate to GoogleLink Navigate to Google
6c. Less Ideal: Link with title, image with meaningful alt <a href="//google.com" title="Navigate to Google"><img src="oomph.png" alt="Oomph logo"></a>Link graphic Oomph logoLink graphic Oomph logoLink image Oomph logo image
7c. More Ideal: Link with title, image with meaningful alt, adjacent text <a href="//google.com" title="Navigate to Google"><img src="oomph.png" alt="Oomph logo">Click image to go to Google</a>Link graphic Oomph logo click image to go to GoogleLink graphic Oomph logo click image to go to GoogleLink graphic Oomph logo click image to go to Google
8c. Most Ideal: Link with aria-label, image with meaningful alt, adjacent text <a href="//google.com" aria-label="Navigate to Google"><img src="oomph.png" alt="Oomph logo">Click image to go to Google</a>link navigate to Google “Alt” and adjacent text ignored!link navigate to Google “Alt” and adjacent text ignored!Link Oomph logo image link click image to go to Google Aria label ignored!

These scenarios are complicated. Let’s address them one by one and the reasons why they are more or less ideal for our intentions:

  1. Image with no alt: Not ideal because VoiceOver announces the file name of the image. The filename of the image is the least important bit of information.
  2. Image with empty alt: Less ideal (but more ideal than 1c) because the URL gets spelled out by VoiceOver. The destination of the link is more important than the name of the image, for sure, but it is a jarring, technical experience to hear the URL like that. Notice how it does not read the URL as we have it typed out, but rather, in the manner that the browser interprets it.
  3. Image with meaningful alt: More ideal because we have supplied the screen reader with a bit of information and it is being announced. We bothered to add alt content, and it is being used, but a description of the image is arguably less important than the destination or description of the link. The word “link” is the only hint to a non-sighted user that this element will take them somewhere else.
  4. Link with title and image with missing alt: Even more ideal because now the title is being announced instead of the image filename, and the title in this case is more meaningful because it describes the link.
  5. Link with title and image with empty alt: This is even more ideal because semantically it is well-formed. Even though screen readers treat 4c and 5c the same, an HTML validator and most accessibility tools will flag 4c as malformed because of the missing alt tag.
  6. Link with title and image with meaningful alt: Less ideal again because, similar to 3c, the less meaningful bit of information is being announced while the more useful bit is being ignored. A title, though, is really meant for a sighted user, as it adds a bit of description to an element that is available to read visually once a mouse pointer rests on the element for a second. While the effect that it has in 4c and 5c is nice, it is not the best way to convey that information to screen readers.
  7. Link with title, image with meaningful alt, adjacent text: More ideal again because now the adjacent text content is being read in favor of the title, and this presumably is more important as it is readily visible to sighted users, and therefore, should be read out loud to non-sighted users. In many cases, authors make title text and the content of a link identical, which is not a great experience for anybody. In this use case the title text is not being read out loud, which would be a duplicate of the visual content that the link contains. In our opinion, the title text is not being ignored, as it is not meant for screen readers in the first place.
  8. Link with aria-label, image with meaningful alt, adjacent text: The most ideal scenario, even though there are major differences between the Windows and Mac screen readers. An aria-label is a way to get important content to screen readers specifically, and the two Windows readers heed the content of the aria-label, ignoring the alt text and the content of the link itself — which means we should proceed with caution when using aria-labels. They are powerful, so, if used, their content needs to be descriptive, not generic. VoiceOver, meanwhile, ignores the aria-label but still announces content that makes sense for the listener, provided that the content of the link is meaningful and descriptive.

In our testing we have come to the conclusion that VoiceOver is trying to replicate the visual experience more than the other two solutions. VoiceOver ignores HTML elements that are specifically designed for assistive technology, opting instead to read whatever is visually available on the page. It ignores aria-label content and elements hidden from screen readers with aria-hidden="true" (which is to say that it announces those elements). The other two have been around longer and are much more respective of content that developers add to the page to give assistive technology a more defined experience. Elements that might be visually available but explicitly hidden from screen readers are announced in VoiceOver.

Our final table shows those differences with some common attributes that aim to work with assistive technology. Screen reader software makers have a long way to go before all of our tools are being used properly:

JAWSNVDAVoiceOver
1d. CSS: speak:none;IgnoredIgnoredIgnored
2d. CSS: voice-family: [value];IgnoredIgnoredIgnored
3d. HTML: aria-label="[content]"AnnouncedAnnouncedIgnored
4d. HTML: aria-hidden="true"Honored (not announced)Honored (not announced)Ignored (announced)
5d. HTML: role="presentation"Honored (not announced)Honored (not announced)Ignored (announced)
  1. speak: none: CSS gives developers a few extras that are supposed to work towards designing the “out loud” experience. Sadly, they don’t work with these screen readers.
  2. voice-family: The voice family attribute is supposed to allow the designer/developer to control the voice of the screenreader. But, arguably, given what we know about how users prefer their screen reader to be set a certain way, in the voice they have gotten used to, at the cadence they prefer, why should we try to override that?
  3. aria-label: This attribute is a way to give additional context to a screen reader. As you saw in our examples from table C, they can be quite useful to serve specific content to the screen reader experience. It makes us a bit crazy to see how VoiceOver ignores all of these attributes.
  4. aria-hidden: Sometimes you want an element to be hidden from a screen reader — the element adds nothing to the non-visual experience, or, it would only confuse the listener. Again, crazy to us that VoiceOver announces these elements anyway.
  5. role="presentation": This attribute should be used when an element with semantic value is used for presentation purposes — an unordered list for navigation, as an example. We saw earlier how list items are announced to the listener (List with six items). If desired, role="presentation" attempts to tell the screen reader Hey, this element has no semantic value here, don’t announce its presence, only announce the content. Again, VoiceOver ignores this attempt to remove semantic meaning, but it also does not announce all semantic elements like the other two.

How descriptive does your “alt” text need to be?

The final topic I want to cover is more subjective. The answer here is, of course, it depends. In general, a non-sighted user is going to want a level of detail that supports the larger story being told on the page. Sometimes, the details are going to matter. Other times, the overall theme or mood of the image is what matters most. Consider this image:

How would you describe this image?
  • Should you describe the focus in this image — the fact that some parts are in focus and some are blurry and out of focus?
  • Should you describe the colors in this image? the saturation?
  • Should you describe the mood that the image is trying to evoke?
  • Should you describe the details — what the child is wearing, what the shirt says?
  • Should you describe the ethnicity of the child? the perceived gender?

For sighted users, the picture says a thousand things without saying any of them out loud. If chosen well, it supports the story that the rest of the words on the page are telling. For non-sighted users, they will need that “alt” description to understand the same story that sighted users are getting. You as the page author are going to have to make some decisions about how to tell that story, and with what amount of detail. Often, those details are the reason why you chose that photo in the first place.

For this article, I decided to describe the image somewhat poetically: A small child happily runs through the mist of a sprinkler on a sunny day. The overall light is yellow and looks warm, and the water droplets are hanging in the air catching the color of the sun.

Wrapping it all Up

The developer community knows how important accessibility is to our users. We can follow all the documented best practices and follow all the advice we can find, but the fact is, we need to test with assistive technology and have the same experience as a non-sighted user to more deeply understand the “out loud” experience. And unfortunately, while browsers don’t have as many differences of opinion as they used to, assistive technology is still at a point where there is plenty of disagreement between the major players such that it requires we take extra care in our approach to a problem.

In short, turn VoiceOver on every now and then. If you have access to a Windows machine, download and use NVDA. The more we understand the “out loud” experience, the better we can design and develop to support it. If you would like to talk to our team about Accessibility best practices, or making your website more accessible, please feel free to get in touch.

Want more posts on Accessibility?

Start with our first installment in a three part series about why accessibility matters.


Notes about the Accessibility of this Article

As I was in the process of learning and testing various screen readers, I was also learning more about how to code an article like this in the most accessible way possible. The tables are accessible using <caption> elements along with <th>> elements in the row and column scope, I made a decision on how to handle the constant repetition of abbreviated terms (like HTML, JAWS, and NVDA), and I made use of aria-describedby labels to link items in the table with their in-depth descriptions in the numbered lists below them. I hope I made decisions that increase the enjoyment and the comprehension of the content of this article for all users. If I got something wrong, or if you have a bit of advice on how I can improve the experience, reach out to me on Twitter.

Additional Resources

ARTICLE AUTHOR

More about this author

J. Hogue

Director, Design & User Experience

I have over 20 years of experience in design and user experience. As Director of Design & UX, I lead a team of digital platform experts with strategic thinking, cutting-edge UX practices, and visual design. I am passionate about solving complex business problems by asking smart questions, probing assumptions, and envisioning an entire ecosystem to map ideal future states and the next steps to get there. I love to use psychology, authentic content, and fantastically unique visuals to deliver impact, authority, and trust. I have been a business owner and real-estate developer, so I know what is like to run a business and communicate a value proposition to customers. I find that honest and open communication, a willingness to ask questions, and an empathy towards individual points of view are the keys to successful creative solutions.

I live and work in Providence, RI, and love this post-industrial city so much that I maintain ArtInRuins.com, a documentation project about the history and evolution of the local built environment. I help to raise two amazing girls alongside my equally strong and creative wife and partner.