accessibility

3 posts

toss

Painting the Wheels of a Moving (opens in new tab)

Toss Design System (TDS) underwent its first major color system overhaul in seven years to address deep-seated issues with perceptual inconsistency and fragmented cross-platform management. By transitioning to a perceptually uniform color space and an automated token pipeline, the team established a scalable infrastructure capable of supporting the brand's rapid expansion into global markets and diverse digital environments. ### Legacy Issues in Color Consistency * **Uneven luminosity across hues:** Colors sharing the same numerical value (e.g., Grey 100 and Blue 100) exhibited different perceptual brightness levels, leading to "patchy" layouts when used together. * **Discrepancies between Light and Dark modes:** Specific colors, such as Teal 50, appeared significantly more vibrant in dark mode than in light mode, forcing designers to manually customize colors for different themes. * **Accessibility hurdles:** Low-contrast colors often became invisible on low-resolution devices or virtual environments, failing to meet consistent accessibility standards. ### Technical Debt and Scaling Barriers * **Interconnected palettes:** Because the color scales were interdependent, modifying a single color required re-evaluating the entire palette across all hues and both light/dark modes. * **Fragmentation of truth:** Web, native apps, and design editors managed tokens independently, leading to "token drift" where certain colors existed on some platforms but not others. * **Business expansion pressure:** As Toss moved toward becoming a "super-app" and entering global markets, the manual process of maintaining design consistency became a bottleneck for development speed. ### Implementing Perceptually Uniform Color Spaces * **Adopting OKLCH:** Toss shifted from traditional HSL models to OKLCH to ensure that colors with the same lightness values are perceived as equally bright by the human eye. * **Automated color logic:** The team developed an automation logic that extracts accessible color combinations (backgrounds, text, and assets) for any input color, allowing third-party mini-apps to maintain brand identity without sacrificing accessibility. * **Chroma Clamping:** To ensure compatibility with standard RGB displays, the system utilizes chroma clamping to maintain intended hue and lightness even when hardware limitations arise. ### Refined Visual Correction and Contrast * **Solving the "Dark Yellow Problem":** Since mathematically consistent yellow often appears muddy or loses its "yellowness" at higher contrast levels, the team applied manual visual corrections to preserve the color's psychological impact. * **APCA-based Dark Mode optimization:** Utilizing the Advanced Perceptual Contrast Algorithm (APCA), the team increased contrast ratios in dark mode to compensate for human optical illusions and improve legibility at low screen brightness. ### Designer-Led Automation Pipeline * **Single Source of Truth:** By integrating Token Studio (Figma plugin) with GitHub, the team created a unified repository where design changes are synchronized across all platforms simultaneously. * **Automated deployment:** Designers can now commit changes and generate pull requests directly; pre-processing scripts then transform these tokens into platform-specific code for web, iOS, and Android without requiring manual developer intervention. The transition to a token-based, automated color system demonstrates that investing in foundational design infrastructure is essential for long-term scalability. For organizations managing complex, multi-platform products, adopting perceptually uniform color spaces like OKLCH can significantly reduce design debt and improve the efficiency of cross-functional teams.

google

StreetReaderAI: Towards making street view accessible via context-aware multimodal AI (opens in new tab)

StreetReaderAI is a research prototype designed to make immersive street-level imagery accessible to the blind and low-vision community through multimodal AI. By integrating real-time scene analysis with context-aware geographic data, the system transforms visual mapping data into an interactive, audio-first experience. This framework allows users to virtually explore environments and plan routes with a level of detail and independence previously unavailable through traditional screen readers. ### Navigation and Spatial Awareness The system offers an immersive, first-person exploration interface that mimics the mechanics of accessible gaming. * Users navigate using keyboard shortcuts or voice commands, taking "virtual steps" forward or backward and panning their view in 360 degrees. * Real-time audio feedback provides cardinal and intercardinal directions, such as "Now facing North," to maintain spatial orientation. * Distance tracking informs the user how far they have traveled between panoramic images, while "teleport" features allow for quick jumps to specific addresses or landmarks. ### Context-Aware AI Describer At the core of the tool is a subsystem backed by Gemini that synthesizes visual and geographic data to generate descriptions. * The AI Describer combines the current field-of-view image with dynamic metadata about nearby roads, intersections, and points of interest. * Two distinct modes cater to different user needs: a "Default" mode focusing on pedestrian safety and navigation, and a "Tour Guide" mode that provides historical and architectural details. * The system utilizes Gemini to proactively predict and suggest follow-up questions relevant to the specific scene, such as details about crosswalks or building entrances. ### Interactive Dialogue and Session Memory StreetReaderAI utilizes the Multimodal Live API to facilitate real-time, natural language conversations about the environment. * The AI Chat agent maintains a large context window of approximately 1,048,576 tokens, allowing it to retain a "memory" of up to 4,000 previous images and interactions. * This memory allows users to ask retrospective spatial questions, such as "Where was that bus stop I just passed?", with the agent providing relative directions based on the user's current location. * By tracking every pan and movement, the agent can provide specific details about the environment that were captured in previous steps of the virtual walk. ### User Evaluation and Practical Application Testing with blind screen reader users confirmed the system's utility in practical, real-world scenarios. * Participants successfully used the prototype to evaluate potential walking routes, identifying critical environmental features like the presence of benches or shelters at bus stops. * The study highlighted the importance of multimodal inputs—combining image recognition with structured map data—to provide a more accurate and reliable description than image analysis alone could offer. While StreetReaderAI remains a proof-of-concept, it demonstrates that the integration of multimodal LLMs and spatial data can bridge significant accessibility gaps in digital mapping. Future implementation of these technologies could transform how visually impaired individuals interact with the world, turning static street imagery into a functional tool for independent mobility and exploration.

google

Making group conversations more accessible with sound localization (opens in new tab)

Google Research has introduced SpeechCompass, a system designed to improve mobile captioning for group conversations by integrating multi-microphone sound localization. By shifting away from complex voice-recognition models toward geometric signal processing, the system provides real-time speaker diarization and directional guidance through a color-coded visual interface. This approach significantly reduces the cognitive load for users who previously had to manually associate a wall of scrolling text with different speakers in a room. ## Limitations of Standard Mobile Transcription * Traditional automatic speech recognition (ASR) apps concatenate all speech into a single block of text, making it difficult to distinguish between different participants in a group setting. * Existing high-end solutions often require audio-visual separation, which needs a clear line of sight from a camera, or speaker embedding, which requires pre-registering unique voiceprints. * These current methods can be computationally expensive and often fail in spontaneous, mobile environments where privacy and setup speed are priorities. ## Hardware and Signal Localization * The system was prototyped in two forms: a specialized phone case featuring four microphones connected to an STM32 microcontroller and a software-only implementation for standard dual-microphone smartphones. * While dual-microphone setups are limited to 180-degree localization due to "front-back confusion," the four-microphone array enables full 360-degree sound tracking. * The system utilizes Time-Difference of Arrival (TDOA) and Generalized Cross Correlation with Phase Transform (GCC-PHAT) to estimate the angle of arrival for sound waves. * To handle indoor reverberations and noise, the team applied statistical methods like kernel density estimation to improve the precision of the localizer. ## Advantages of Waveform-Based Diarization * **Low Latency and Compute:** By avoiding heavy machine learning models and weights, the algorithm can run on low-power microcontrollers with minimal memory requirements. * **Privacy Preservation:** Unlike speaker embedding techniques, SpeechCompass does not identify unique voiceprints or require video, instead relying purely on the physical location of the sound source. * **Language Independence:** Because the system analyzes the differences between audio waveforms rather than the speech content itself, it is entirely language-agnostic and can localize non-speech sounds. * **Dynamic Reconfiguration:** The system adjusts instantly to the movement of the device, allowing users to reposition their phones without recalibrating the diarization logic. ## User Interface and Accessibility * The prototype Android application augments standard speech-to-text with directional data received via USB from the microphone array. * Transcripts are visually separated by color and accompanied by directional arrows, allowing users to quickly identify where a speaker is located in the physical space. * This visual feedback loop transforms a traditional transcript into a spatial map of the conversation, making group interactions more accessible for individuals who are deaf or hard of hearing.