
Accessibility Research
Evaluating AI Tools for Accessibility
My Role
UX researcher, Prompt Engineer, Accessibility Research Facilitator
Project Timeline
April 2024
Tools Used
NVDA screen reader, Zoom, Adobe Acrobat, Chat GPT, Google Gemini
Context
Overview
This study investigates how well two AI tools—ChatGPT and Google Gemini—support blind and low vision (BLV) users in understanding content from inaccessible PDFs and image-based documents. I designed and facilitated user testing with a BLV participant to identify accessibility gaps and provide recommendations for inclusive AI interactions.
So why did I take on this study?
Many AI tools are being marketed as accessibility aids, but their actual performance for BLV users has not been thoroughly tested. This project explores how effectively ChatGPT and Gemini can help users understand structured documents (PDFs) and visual flyers (images), especially when accessed using screen readers.
Research Scope
Goals of the Study
Test how well ChatGPT and Gemini describe visual content to BLV users .
Identify interaction challenges when using AI tools with screen readers.
Evaluate response accuracy, including layout, structure, and spatial metaphors.
Questions Guiding the Study
Can AI tools access and summarize factual information from the web in a text-to-speech format?
How well do AI tools handle different content types (news articles, research papers, code)?
Can AI tools provide information in audio formats suitable for screen readers?
Do AI tools offer alternative output methods like Braille or haptic feedback (if applicable)?
The Study
Methodology
Participant
One legally blind user (Julian), using a NVDA screen reader
Setup
Remote session via Zoom; I observed the screen reader output and shared screen from Julian
Materials
Remediated and non-remediated PDFs, scanned forms, flyer images
Tools Tested
ChatGPT (GPT Builder & Image Input), Google Gemini (Voice Input & Image Analysis)
Approach
Iterative prompt testing, replication across platforms, audio observation, visual cross-verification
The First Study
GPT Builder Configuration
"I am blind. I use screen readers to read PDF documents. The content of some PDF documents are not accessible to me when using my screen reader. I need a GPT that I can upload inaccessible PDF documents to. Then this GPT will recognize the content of that document. It generates a textual format for that document such that I can read with my screen reader."

The GPT returned a breakdown of the PDF, including field descriptions, a document summary, and a structural overview.
The preview section of GPT Builder was mislabeled as a button, which caused confusion for screen reader navigation.
PDF of an image with text
Next, We tried a PDF that contained an image with text in it.

GPT failed to extract content and displayed a non-selectable error message: “No text could be extracted from this file.”
The screen reader could not access the error message, meaning Julian wouldn’t have known the cause without sighted assistance.
A scanned document
We uploaded a pdf of a document scanned using a phone camera and converted to pdf format.

GPT failed to extract text from a scanned PDF.
It speculated about document content based on the file name.
When prompted to use OCR, GPT responded that it did not have OCR capabilities which was disappointing for Julian
Reconfiguring the GPT
"When the PDF is uploaded, I would like my GPT to walk me through the document by reading the actual content from the document. Similar to how a sighted person may read the actual text of a document for a blind person."
We uploaded a new form which I hadn’t seen before and uploaded it to evaluate the GPT’s descriptive accuracy
GPT read through the form content, and after its response, Julian sent me the file for visual verification.
We noticed that the GPT preserved text order accurately but failed to describe visual layout nuances such as right-aligned text.
The GPT also generated a summary of fillable fields at the end, which was particularly useful.
Findings
General Observations
The remediated PDF was significantly more accessible to the screen reader than the non-remediated version.
Citation accessibility issues were identified:
Multiple in-text citations were not all linked to the reference section.
All links redirected to the top of the references page, not to specific items.
Every citation was prefixed with the word “Note”, which was not present visually.
Image descriptions provided by GPT were rudimentary but functionally helpful.
GPT-Specific Observations
GPT was able to parse text-based PDFs and summarize structure and form fields effectively.
GPT failed to handle:
PDFs with no extractable text (e.g., scanned files).
Visual formatting such as Juliangnment or styling.
Error messages that weren’t screen reader accessible.
GPT made reasonable speculative inferences based on file names, which could mislead users if not clearly framed.
The Second Study
Experimenting with Image descriptions
Google Gemini
"Tell me about the visual form and shape of the Francis Scott Bridge in Baltimore. I'm visually impaired and I need a descriptive description about that."
Interaction Issue
Julian used the "Listen and Send" feature, expecting the prompt to auto-send after speaking.
Instead, Gemini required pressing the mic button again to stop recording, followed by tapping the send button, which was non-intuitive and inaccessible.
Gemini came up with pretty accurate descriptions of the bridge using a rainbow analogy to explain the arch shape, calling it a “long metal walkway but for cars”
It was also correct in bring in up the recent collapse of the bridge due to a ship accident.
Result Highlight Warning
Some text in the response was highlighted with a warning about accuracy.
These highlights (red and green) were visually styled only; they did not appear in the DOM, making them undetectable to the screen reader.
Text Descriptions of a Flyer
We asked Gemini to describe the content of an image-based flyer provided to Julian.
Initial Results
It accurately extracted all the text from the image.
However, it did not describe the spatial layout or position of the text.
“Where is the text located in relation to the graphical elements of the flyer?
Gemini broke down sections of the flyer describing their location in relation to one another but rather confusingly where Julian was unable to picture the graphical elements like the laptop and smartphone.
Advanced Spatial Representation:
Julian asked: “Can you tell me where the laptop, pen, and phone graphics are located in the space relative to the main text content? Consider that I am legally blind; give me a special representation to understand how this flyer is visually laid out.”
Gemini responded with a "tic-tac-toe" analogy, dividing the flyer into a 3x3 grid for spatial reference.
Later, we prompted it to use a clock-face analogy, which is more intuitive for blind users.
Gemini successfully adapted to this and used the clock metaphor in subsequent responses.

Replicating Prompts in ChatGPT
In the spirit of research replication, we copied the same prompts and tested ChatGPT with the same flyer image.
Image Description Results:
ChatGPT accurately extracted all text from the image. It described colors, fonts, and graphical elements. When asked about text locations, it responded accurately and also included background colors, which Gemini had omitted.
Spatial Representation:
ChatGPT used the clock-face analogy effectively without needing as much prompting.
It provided a clear spatial understanding of the flyer layout, including the relative location of icons and text blocks.
Limitations:
ChatGPT can accept image files but not PDF files with embedded or scanned images.
When provided the flyer as a PDF image, it could not extract any text from it.
Findings
To summarize
Feature/Task
ChatGPT
Google Gemini
Text extraction from images
✅
✅
Text extraction from scanned PDFs
❌
✅
Layout/spatial description
✅ (clock analogy)
⚠️ With extra prompting
Color/styling description
✅ Detailed
❌ Limited
Voice input accessibility
N/A
❌ Difficult for screen readers
Interface accessibility
Some unlabeled/wrongly labeled elements
Hidden highlights, confusing controls