Blue

Accessibility Research

Evaluating AI Tools for Accessibility

Describe Francis Scott Bridge
Describe Francis Scott Bridge
Describe Francis Scott Bridge

My Role

UX researcher, Prompt Engineer, Accessibility Research Facilitator

Project Timeline

April 2024

Tools Used

NVDA screen reader, Zoom, Adobe Acrobat, Chat GPT, Google Gemini

Context

Overview

This study investigates how well two AI tools—ChatGPT and Google Gemini—support blind and low vision (BLV) users in understanding content from inaccessible PDFs and image-based documents. I designed and facilitated user testing with a BLV participant to identify accessibility gaps and provide recommendations for inclusive AI interactions.

So why did I take on this study?

Many AI tools are being marketed as accessibility aids, but their actual performance for BLV users has not been thoroughly tested. This project explores how effectively ChatGPT and Gemini can help users understand structured documents (PDFs) and visual flyers (images), especially when accessed using screen readers.

Research Scope

Goals of the Study

Test how well ChatGPT and Gemini describe visual content to BLV users .

Identify interaction challenges when using AI tools with screen readers.

Evaluate response accuracy, including layout, structure, and spatial metaphors.

Questions Guiding the Study

Can AI tools access and summarize factual information from the web in a text-to-speech format?

How well do AI tools handle different content types (news articles, research papers, code)?

Can AI tools provide information in audio formats suitable for screen readers?

Do AI tools offer alternative output methods like Braille or haptic feedback (if applicable)?

The Study

Methodology

Participant

One legally blind user (Julian), using a NVDA screen reader

Setup

Remote session via Zoom; I observed the screen reader output and shared screen from Julian

Materials

Remediated and non-remediated PDFs, scanned forms, flyer images

Tools Tested

ChatGPT (GPT Builder & Image Input), Google Gemini (Voice Input & Image Analysis)

Approach

Iterative prompt testing, replication across platforms, audio observation, visual cross-verification

The First Study

GPT Builder Configuration

"I am blind. I use screen readers to read PDF documents. The content of some PDF documents are not accessible to me when using my screen reader. I need a GPT that I can upload inaccessible PDF documents to. Then this GPT will recognize the content of that document. It generates a textual format for that document such that I can read with my screen reader."

Configuring the GPT

The GPT returned a breakdown of the PDF, including field descriptions, a document summary, and a structural overview.

The preview section of GPT Builder was mislabeled as a button, which caused confusion for screen reader navigation.

PDF of an image with text

Next, We tried a PDF that contained an image with text in it.

Text unable to be extracted

GPT failed to extract content and displayed a non-selectable error message: “No text could be extracted from this file.”

The screen reader could not access the error message, meaning Julian wouldn’t have known the cause without sighted assistance.

A scanned document

We uploaded a pdf of a document scanned using a phone camera and converted to pdf format.

No OCR Capabilities

GPT failed to extract text from a scanned PDF.

It speculated about document content based on the file name.

When prompted to use OCR, GPT responded that it did not have OCR capabilities which was disappointing for Julian

Reconfiguring the GPT

"When the PDF is uploaded, I would like my GPT to walk me through the document by reading the actual content from the document. Similar to how a sighted person may read the actual text of a document for a blind person."

We uploaded a new form which I hadn’t seen before and uploaded it to evaluate the GPT’s descriptive accuracy

GPT read through the form content, and after its response, Julian sent me the file for visual verification.

We noticed that the GPT preserved text order accurately but failed to describe visual layout nuances such as right-aligned text.

The GPT also generated a summary of fillable fields at the end, which was particularly useful.

Findings

General Observations

  1. The remediated PDF was significantly more accessible to the screen reader than the non-remediated version.

  2. Citation accessibility issues were identified:

  3. Multiple in-text citations were not all linked to the reference section.

  4. All links redirected to the top of the references page, not to specific items.

  5. Every citation was prefixed with the word “Note”, which was not present visually.

  6. Image descriptions provided by GPT were rudimentary but functionally helpful.

GPT-Specific Observations

  1. GPT was able to parse text-based PDFs and summarize structure and form fields effectively.

  2. GPT failed to handle:

    1. PDFs with no extractable text (e.g., scanned files).

    2. Visual formatting such as Juliangnment or styling.

  3. Error messages that weren’t screen reader accessible.

  4. GPT made reasonable speculative inferences based on file names, which could mislead users if not clearly framed.

The Second Study

Experimenting with Image descriptions

Google Gemini

"Tell me about the visual form and shape of the Francis Scott Bridge in Baltimore. I'm visually impaired and I need a descriptive description about that."

Interaction Issue

Julian used the "Listen and Send" feature, expecting the prompt to auto-send after speaking.

Instead, Gemini required pressing the mic button again to stop recording, followed by tapping the send button, which was non-intuitive and inaccessible.

Gemini came up with pretty accurate descriptions of the bridge using a rainbow analogy to explain the arch shape, calling it a “long metal walkway but for cars

It was also correct in bring in up the recent collapse of the bridge due to a ship accident.

Result Highlight Warning

Some text in the response was highlighted with a warning about accuracy.

These highlights (red and green) were visually styled only; they did not appear in the DOM, making them undetectable to the screen reader.

Text Descriptions of a Flyer

Image Flyer
Image Flyer

We asked Gemini to describe the content of an image-based flyer provided to Julian.

Initial Results

It accurately extracted all the text from the image.

However, it did not describe the spatial layout or position of the text.

“Where is the text located in relation to the graphical elements of the flyer?

Gemini broke down sections of the flyer describing their location in relation to one another but rather confusingly where Julian was unable to picture the graphical elements like the laptop and smartphone.

Advanced Spatial Representation:

Julian asked: “Can you tell me where the laptop, pen, and phone graphics are located in the space relative to the main text content? Consider that I am legally blind; give me a special representation to understand how this flyer is visually laid out.”

Gemini responded with a "tic-tac-toe" analogy, dividing the flyer into a 3x3 grid for spatial reference.

Later, we prompted it to use a clock-face analogy, which is more intuitive for blind users.

Gemini successfully adapted to this and used the clock metaphor in subsequent responses.

Replicating Prompts in ChatGPT

In the spirit of research replication, we copied the same prompts and tested ChatGPT with the same flyer image.

Image Description Results:

ChatGPT accurately extracted all text from the image. It described colors, fonts, and graphical elements. When asked about text locations, it responded accurately and also included background colors, which Gemini had omitted.

Spatial Representation:

ChatGPT used the clock-face analogy effectively without needing as much prompting.

It provided a clear spatial understanding of the flyer layout, including the relative location of icons and text blocks.

Limitations:

ChatGPT can accept image files but not PDF files with embedded or scanned images.

When provided the flyer as a PDF image, it could not extract any text from it.

Findings

To summarize

Feature/Task

ChatGPT

Google Gemini

Text extraction from images

Text extraction from scanned PDFs

Layout/spatial description

✅ (clock analogy)

⚠️ With extra prompting

Color/styling description

✅ Detailed

❌ Limited

Voice input accessibility

N/A

❌ Difficult for screen readers

Interface accessibility

Some unlabeled/wrongly labeled elements

Hidden highlights, confusing controls

LET'S CONNECT!

Email Button
Linkedin Button
Github Button
Daisy outline

Made with 🤍 in ☀️ California

LET'S CONNECT!

Email Button
Linkedin Button
Github Button
Daisy outline

Made with 🤍 in ☀️ California

LET'S CONNECT!

Email Button
Linkedin Button
Github Button
Daisy outline

Made with 🤍 in ☀️ California