and Individual User Interface Generators (IUIGs)
The page describes an alternate hypothetical approach to accessibility.. It will be continually updates as the concept evolves over time
The field of ICT accessibility has seen significant progress over the last four decades. What started with “special devices for special people” evolved into public and company-specific accessibility guidelines, international standards, and accessibility laws both in Europe and the U.S. Many large companies have dedicated teams to improve accessibility and have built significant accessibility features directly into their products. The growing emphasis on accessibility in the industry has given rise to consultants, accessibility evaluation and remediation companies, and training programs aimed at developing, training, and certifying accessibility specialists.
Despite all of the progress in accessibility, however, there are still major shortcomings. Audits of the field reveal that a low percentage of websites and products are accessible. Moreover, while some products have built-in accessibility features, they are only accessible to some individuals with disabilities. For example, smartphone screen readers with their gesture controls are fantastic for some blind users but are too complicated or physically impossible for others who are blind. Additionally, many/most products don’t effectively address the range of cognitive, language, and learning disabilities, even though this is cumulatively the largest disability group.
While we’ve made great progress from essentially zero products accessible to anyone 40 years ago, today there are still only a fraction of products that are accessible. Even the best among these are still inaccessible to a wide range of individuals. In sum:
- There are no products that are accessible across all of the different types, degrees, and combinations of disability.
- There are a small number of products that are reasonably accessible across disabilities, but even those are only accessible to more typical or able individuals, (e.g., those who are blind but are more digitally adroit versus the full range of people who are blind and who may have other disabilities).
While it’s essential to continue moving forward with our traditional methods, there’s also a need to consider augmenting them with new approaches that:
- can reach the large number of individuals who are currently left out and
- require less effort so more companies are willing and able to make their products accessible.
Recent and emerging advances in technology may give us the tools to do this.
An alternate (supplemental) approach to accessibility.
What if interfaces we experience fit us – exactly, rather than we having to try to adapt to and understand them?
As an alternate approach to interface accessibility and usability we propose a combination of nan InfoBot functionality with Individual User Interface Generators (IUIGs).
The Info-Bot function would use the standard human interface on the product as input. That is, the Info-Bot would require no special API. The API, if you will, would be the bit image of the interface (via screen capture or camera) and the keyboard, mouse, touchscreen or other standard input devices on the product. The Info-Bot would not be put into service until it was at least able to understand any digital interface it encountered as well as the median human being. Thus if manufactures can design a product that at least half of the humans can understand, the Info-Bot functionality would be able to understand it from the standard human interface on the product.
The Info-Bot functionality would be coupled with Individual User Interface Generators (IUIGs) that would create an interface tailored to each person, for each device they encounter. Each time a person encountered a device with a digital interface, their personal IUIG would generate an interface for that product that was tailored to their particular abilities, preferences, and past-experience (familiarity). It would not be a transformation or alternate presentation of the interface on the product (e.g. like a screen reader). It would be a bespoke interface design that was the optimum interface for that individual for the functionality of that product. Similar products would all have essentially identical interfaces. And different products would all have interfaces that matched the functionality of those products but use familiar metaphors and elements so that even product never encountered before would be somewhat familiar and easier to learn and remember.
In short what is proposed is to:
- create a single, open-source intelligent agent – the InfoBot – that can be pointed at any interface and would be able to understand and operate the interface as well as 50% of the population. It would not be as smart as people – just as smart as the median person in figuring out that interface. The Info-Bot would be coupled to an individual user interface generator (IUIG).
- create a range of Individual User Interface Generators (IUIGs) that can take the information from the Info-Bot – and create an interface that would be tailored to an individual – for each product they encounter. Different people would have different IUIGs based on each person’s abilities, limitations, knowledge, background, culture, and preferences.
This approach would ensure that users consistently experience familiar, consistent interfaces, regardless of the devices they encounter and have to interact with – making technology more accessible and intuitive for everyone.
Additionally, this approach has the potential to reduce companies’ burden by eliminating each company’s need to understand and be able to design for and accommodate every type, degree, and combination of disability. Companies can still practice inclusive design so maximize the number of people who can directly use their products. But everyone else would also be able to user their products – even those the company isn’t to design for.
The approach simultaneously increases the % of products that would be accessible, from both small and large companies from about 3% today to near 100% (as long as the product is designed to be understood by 50% of the population — or 50% of the products users.
How it would work
The Info-Bot function would take any interface that it was exposed to, including an immersive 3D interface, and be able to understand and abstract it so that an alternate interface can be created (by an Individual User Interface Generator –(see IUIG below) that may be totally different (to meet each user’s unique needs) but accomplishes the same functionality. For example, in the case of a user who was blind, a typical visual interface could be abstracted so that a completely optimized audio interface could be created from it by that person’s Individual User Interface Generator.
Since the Info-Bot would have the capabilities of the median user for perceiving and understanding digital interfaces, the Info-Bot could be able to perceive and understand any controls, texts and visuals that the average (median) user is able to. Assuming that a product was designed to be usable by the median person (at least half of the population), the Info-Bot would work with any product with a digital interface. For specialized equipment (scanning electron microscopes) it will be useable depending on the complexity of the interface (not the product but the interface), the knowledge of the user, and any special ‘training’ that the Info-Bot has been given for that device’s interface.
Some potential characteristics of the proposed Info-Bot:
- free for all to use – as individuals – or for incorporation into product architectures.
- is able to understand interfaces at least as well as the 50th percentile human (the median)
- companies therefore would have to design a product that could be understood and used by at least half of the population (but no more) in order to have it understood by the Info-Bot
- if the Info-Bot could not understand some new feature of a company’s product, the Info-Bot would be open-source so the company can improve it so that it could.
- open source (perhaps LGPL) so all can improve it and all benefit from improvements. LGPL also important for compatibility and interoperability.
- actively supported by industry and government(s) so it stays up to date and functioning at “50% or better” level of the population
- since it would be a centralized resource, it can be exposed to new interfaces by manufactures or others and once it has seen and learned an interface once – it knows it for all instances of the product being used by anyone, anywhere.
- since it would be open-source, if it has trouble with any completely new interface technique introduced by a company, the company (or others) can teach the info-bot about that interface technique. However, most new products use interface techniques known to 50% or more of the population even if the product is completely new.
- runs in cloud initially – but runs locally in future
- this allows continual updating from the cloud,
- but running locally can assure privacy, when there is no back-collection of data
- is initially separate from Individual User Interface Generators (IUIGs) initially but may merge with them later.
- is able to take output from the Individual User Interface Generators (IUIGs) and operate the product interface.
- this may be via API or direct simulation of human control movements that the product is expecting.
- funding to maintain the Info-Bot would come from government funding – or from industry as part of a social contract for industry being able to rely on the Info-Bot to address accessibility for those who cannot use the standard interface on their products.
Individual User Interface Generators (IUIGs) interact with the Info-Bot and create a custom interface for each individual and optimized for their abilities at that moment. This interface would not be a transformation of the original interface into a different modality (i.e. it would not be an auditory presentation of a visual interface like current screen readers). Rather it would be an interface that would be completely optimized for that individual. It would be, for example, the interface that standard products would have if everyone in the world were exactly like this individual.
Except it would be even better for the individual than that. Rather than being the interface for the average person without vision, it would be the interface for a person without vision that was just like them. If they were very bright, technically adroit, and loved massively parallel interfaces that would be what it would present. If they had trouble with technology and complexity it would present a different interface more tuned to their abilities, possibly involving more groups of fewer elements to choose from – and a more guided interface. The interfaces may be more direct command oriented or more interaction oriented depending on the preferences and skills of the user with each device.
Some characteristics of the envisioned Individual User Interface Generator (IUIG):
- each IUIG would be specific to an individual
- for products with the same functionality, but different interfaces, the IUIG would present the same (familiar, optimized) interface to the user – with just a different name.
- for example, the user would see the same interface for all microwave ovens – with the only difference being features added or missing if they were present or missing from a particular microwave.
- Ditto for all TV streaming services. The choices, favorites, continue watching, search, sign in, etc. would all be presented and operate the same familiar way.
- if a TV streaming service changed its interface – the IUIG interface would not change unless the user wanted it to. If there were new features – they would be added to an interface that otherwise did not change.
- products with very different functionality would have different interfaces, but they would operate with user controls and metaphors that were familiar to the user.
- initially – IUIGs would be hand-designed by experts – and individuals would select (or have selected for them) the IUIG that is best suited to their abilities and preferences.
- over time – AI could be used to help adapt and adjust IUIGs to be hyper-personalized for each individual user.
- If an IUIG doesn’t perfectly align with a user’s needs, they have the power to give feedback (e.g. “That was too fast,” or “That was confusing.” or “Too many choices”) and have their feedback used to refine and enhance their IUIG. The focus here is on the individual’s preferences and their lived experiences.
- all changes to the IUIG behavior would be under user control. This includes the ability to explore different IUIGs to see if they like other interface approaches better, and the ability to reject any change suggested by or for the IUIG.
- Similar to today’s assistive technologies, IUIGs would include both free versions and commercial versions.
- IUIGs that would account for different languages, cultures, etc. would also be available.
IUIGs would present the interface for any product the individual encounters in the form best suited to them. This may be visual, voice, tactile, simple, complex, few choices at a time, many choices at a time, using gestures, not requiring gestures, with large controls, with tiny control requiring minimum movement, operable with eye-gaze, operable with thought, etc.
The key is that the individual would have an interface they could use with ANY product they encounter without the product manufacturer needing any understanding of their particular type, degree, or combination of abilities/disabilities.
IUIGs could be created by assistive technology vendors, researchers, consumers, family members, disability organizations, or anyone else with the skills required – to meet the needs of a person, or people with different types, degrees, and combinations of disabilities.
At first, IUIGs would likely resemble the current spectrum of interfaces provided by assistive technologies. And they would leave the same gaps as our current assistive technologies do – reflecting our lack of understanding exactly how to design interfaces for many underserved groups. There are excellent general descriptions of the types of things that would help different underserved groups [refs] but not specific designs for each different types, degrees, and combinations of disability – for each different product. Later, as we carry out more research and develop targeted IUIGs for particular people in response to this new capability, a richer array of IUIGs will emerge that can address previously unaddressed and under-addressed users. The incorporation of AI may also allow us to address the problems presented by people whose abilities are changing rapidly – or whose abilities change from day to day, or even over the course of each day.
It is important to note that although the Info-Bot and the Individual User Interface Generators (IUIGs) would be separate in the beginning – they may eventually merge in order to provide a more optimum information exchange or tighter integration.
One challenge that presents itself with the Info-Bot/IUIG approach is the activation of the standard interface by the Info-Bot. Once the user makes their choice or otherwise operates their IUIG interface, those actions need to be communicated back to the standard product interface.
There are currently three approaches being discussed for doing this. Two are API free and one uses a very simple API that requires no knowledge of disability.
Direct activation by the Info-Bot. The Info-Bot itself could be outfitted with manipulators that are capable of operating the standard user interface directly. The BrushLens and TouchA11y projects both involve a device that scans a touchscreen, provides an alternate (non-visual) interface to the user, and then has two different mechanisms that activate the touch screen – TouchA11y uses an extendable tape on a swivel that extends and positions a touch probe over the desired button and then activates it. With BrushLens, the device instructs the user to move their device to the approximate location on the screen, and then one of an array of autoclicker(s) on the device that is over the desired button is activated, which registers a touch on the specific part of the touchscreen. For the universal operation of human interfaces, a manipulator that can provide a wider range of human articulation may be required.
Cuing of the user. A second approach involves directing the user to operate the device. After the user makes their choices on the IUIG, the Info-Bot directs them in operating the standard interface. It might use a directed beam of light, where the user simply pushes whichever buttons or keys the Info-Bot shines a light on. It might do this with actual light – or it may just highlight the buttons on screen with a “virtual ring light” and the person looks at the augmented video on screen as they use their hand to push, twist, flip or otherwise operate physical controls. For individuals who are blind it can use audio cues including spatial (if they have good binaural hearing) tonal and beep rates to direct the users hand movements. In a very short period of time the cues would become almost reflexive and quick without requiring thought or interpretation of the audio cues.
Very simple X-Y-Z API. The third approach does require an API but a very simple one that can be implements by engineers with no knowledge of disability or accessibility. It can also provide a very valuable interface for product testing – so will have benefits to the company besides accessibility. With this approach each product (with a two-dimensional interface) would be able to accept an x and y position and an action at that position. If the product was a touch screen with simple button presses, then it would only have to accept an x and y coordinate and a ‘press’ or ‘click’ command. If the length of time a button was pressed also had meaning it would include ‘press-down’ and ‘press-release’ etc. Each product would also include two reference points visible to cameras that the Info-Bot could use as references for the x-y coordinates. A very simple yet secure method for connecting this Info-Bot interface to the Info-Bot will be required. Perhaps a dynamic number/QR code and a lockout to allow only local operation and only one person access at a time, or some other method, will be required for security.
Section 4: Benefits of the alternate approach
The benefits from such an approach would include benefits to users, product designers and government/society.
Benefits from a user perspective:
Some of the benefits to users would include include near total accessibility; universal compatibility; a unified interface for similar products; control over unsolicited changes; a standardized mental model; adaptability; addressing cognitive, language and learning needs; and reducing the learning curve.
Near Total Accessibility: The IUIG and InfoBot would provide a major leap forward for accessibility of all products. The accessibility gap would no longer be widening at an ever increasing rate, and accessibility penetration would significantly increase from 3% to nearer 100% – giving people with disabilities the same opportunities and product acces as everyone else. For those who struggle with technologies in their lives, technologies that they are forced to use in order to live independently- but cannot understand or use, this would be a sea-change.
Universally Compatible: Because it would rely only on the standard interface, this strategy does not require any special API to work. The system would be designed to work on all products everywhere and would offer consistent and instantaneous accessibility across devices and applications. It would not provide access to just some devices but to all devices with digital (and some other) interfaces they encounter.
Unified Interface for Similar Products: For products with the same or similar functions, users would need to learn only one interface. For example, let’s say someone is trying to navigate a newly downloaded streaming services app, (lets tall them UltraFlix or UniChannel). All streaming services fundamentally offer the same functions (selecting and playing streamed content) but the user experience is drastically different for each. Every app presents its unique menu, navigation structure, and interaction. Downloading a new one means the person has to learn a new interface despite the fact that it does basically (or identically) the same thing as the interface the user already knows. However, when using the InfoBot/IUIG, instead of grappling with the different interfaces and confusing designs, the IUIG would offer the users a consistent and familiar experience which would work the same way for across all of the similar services.
Control over [Unsolicited] Changes: Even the user never buys a new service, they can find the service they already use drastically change its interface. If the interface of a product undergoes an update or change, the interface the IUIG would generate would remain unchanged for the user, offering a seamless experience. The system could prompt users to try out the new interface features – and even new interface approaches. But these would be offered not forced, with the user able to accept, reject or accept-and-revert-later to the old way, as they choose. Innovation would not be stifled, but rather unsolicited innovation would be managed.
Preferably, any auto-adaptation to the IUIG interface itself would be subtle and non-disruptive. One goal is to avoid solutions that users find irritating, like unsolicited and persistent help agents. Additionally, interfaces would not change abruptly. Even if the change is objectively “better” (e.g., more efficient), some people have preferences for a certain way of doing things. In these cases, change can be the worst thing for them. The bottom line would be user control of any unnecessary change.
Standardized Mental Model: Whether adjusting a thermostat, setting a microwave, or navigating a television interface, the user wouldn’t have to understand multiple interface techniques for the same interface function (e.g. pick from list. The IUIG would present familiar interface elements across the devices, whether it’s pull-down menus or twisting dials, thus standardizing an individual’s user experience across different devices.
Adaptability: In the beginning, experts would design the initial user interfaces. However, as time progresses, interface designs that might have started off expert-designs, could incorporate AI to allow them to evolve to fit individual needs better and as user needs change. For example, as someone gains new skills, the interface might adjust accordingly. Conversely, if someone’s abilities decline or if they’re struggling due to aging or other factors, the interface could adapt in different ways to accommodate those changes.
Cognitive, Language, and Learning Disabilities: Our cognitive abilities aren’t static. They fluctuate, sometimes within moments. On some days, we may grasp concepts faster, while on others, we might need a little more time. In addition, we’ve all experienced moments of not understanding a concept until it’s broken down for us, or someone provides a concrete example. Then suddenly we understand the concept more fully or in a more abstract or broad context. Especially with individuals who have cognitive, language, and learning disabilities, presenting information at a lower understandable level first often enables them to then understand it in a higher-level presentation. With the InfoBot, interfaces could start at a level the user understands and then gradually increase in complexity as a user grasps the concept – allowing every user to engage with technology effortlessly and equitably – simpler at first and more fully as they develop understanding.
Reducing the Learning Curve: Currently, many interfaces present a steep learning curve, leaving users feeling lost or overwhelmed. Users should not be required to expend their mental energy navigating complicated interfaces. Instead, their focus should be solely on their intended task, be it content creation, communication, or any other endeavor. Just as with introducing content at a lower level and then raising it, IUIGs could start out by using interface paradigms that are simpler or already familiar to the user when they encounter a new device or a new task. Then, as a person achieves skill and understands the task – more efficient interface elements could be introduced for adoption or rejection by the user.
Benefits to industries
Some of the benefits to industry would include a decreased accessibility burden, a simplified design process, higher compliance and reduced litigation risks, scalability, and wider market reach.
Less Burden on Industry: The InfoBot & IUIG would not require anyone – manufacturers, designers, developers, etc. – to have a deep understanding of accessibility or have disability expertise. This approach could also reduce the burden of constantly training staff due to organizational changes or turnover. Instead of depending on manufacturers, the mediation layer introduced by the InfoBot/IUIG would be applied post-manufacture. In other words, after the product is already manufactured and in the user’s hands, the Info-Bot/IUIG could provide access – by ‘understanding’ the product’s standard interface and ‘translating’ it into a format the user can interact with based on their personal preferences and needs.
This does not mean that companies should not continue to create products that are accessible out of the box for as many people as they can. It does mean however that they would be able to reach a much broader range of users – and have a safety net for those who have not been able to use their products – no matter how hard they have tried. This includes many who may not identify with having a disability – but simply find products too confusing or difficult to use.
Simplified Design Process: Designers could focus on what they do best without trying to learn and design for every type, degree, and combination of disability. By providing a framework where the InfoBot can provide access, they would reach their current and a much broader audience.
Higher Compliance and Reduced Litigation Risks: Industries could achieve higher compliance levels in accessibility standards and prevent lawsuits. The current model is to build in as much accessibility to as many people as possible – and rely on assistive technologies for those for whom built-in accessibility is not possible or practical. This same model can be extended to cover a much wider range of users by having the Info-Bot and IUIG act as a sort of super-AT to provide an alternate interface.
Helps Address the Closed Product or Closed Functionality problem. Currenly, the “use assistive technologies for those who can’t use the product directly” approach is not open to products such as Kiosks that do not allow connection of assistive technologies. Instead of having the AT approach not available for closed products (where there is no API or programmatic access) the Info-Bot can achieve “machine access” using only the standard human interface. Thus, with Info-Bot/IUIG, all products, including previously closed products would be accessible.
Scalability and Wider Market Reach: The InfoBot & IUIG would be exponentially more scalable (to a wider range of users) than current accessibility approaches. The range of users who can use a product would be limited only by the availability of IUIGs for different types of users, and the ability of the user to understand the underlying function of the product (e.g. even if provided with an accessible interface, many people may not be able to understand a quantum computer control panel sufficiently to operate it). Reaching a wider range of users – including those with multiple and cognitive disabilities as is common in older populations – can both increase profits but also improve the brand’s reputation.
Benefits to Government/Society
The potential benefits to government and society include fewer regulations, fewer lawsuits and more people being able to participate in society.
Fewer regulations – Currently accessibility guidelines and regulations go into great detail in order to try to make sure that all of the different aspects of products are accessible to as many types, degrees, and combinations of disability as possible. As more and more types of ICT have emerged, including immersive technologies, and as more and more products are “closed” (i.e., they do not have a way for assistive technologies to connect and work with them), the creation of accessibility standards has become more complicated, and has put more and more requirements on industry in order to make these new technologies accessible.
Where assistive technology used to be able to be relied upon to provide access for many individuals with more severe disabilities, the lack of ability to use assistive technologies with newer closed products means that companies are being asked to provide those accessibility functions (that AT used to provide) as a standard part of their product. This is very complicated to do. It is also very difficult to do for all types, degrees, and combinations of disabilities. The result is both increasingly complex requirements, regulations, and demands on industry and decreasing accessibility to users. An Info-Bot/IUIG approach would remove the need for many of these new requirements to provide built-in equivalents to assistive technologies on closed products by eliminating the “closed” nature of the products. It Info-Bot would provide an assistive technology API if you will, that only requires the standard human interface as input.
Fewer lawsuits – at least for Information and communication technologies (ICT). As accessibility gets more complicated and difficult – the ability of companies to provide the same level of accessibility gets more difficult – and compliance may continue to be low, or even drop further. By creating a means for products with ‘median’ usability to be accessible, it could significantly reduce the effort needed by industry, while greatly increasing the number of people who can use the products. The result would be fewer lawsuits (around ICT accessibility).
More people would be able to use new technologies, and live, work, and participate more independently. There is both a fiscal, and quality of life, cost to society when people are not able to learn, work and live successfully and independently. As we build digital interfaces into everything we have put them outside the understandability and operability of people who have trouble or cannot use standard digital interfaces. To the extent that Info-Bot/IUIG can put these devices back within the reach of these people, we can increase the percentage of our population that is able to be successful in life and independent living.
Section 5: What is needed to make this possible
While the Info-Bot/IUIG concept is fairly simple, what it requires is less so. However, recent research and technological advancements hint at the feasibility of this vision. And we’re already seeing projects that are laying the groundwork for it.
Some of the advances that will be needed before this could be realized though include abstracting user interfaces; Artificial Intelligence; local Artificial Intelligence; user-interface understanding; mapping user intext into actions; content-based understanding; multimodal integration; understanding of how to design an interface for each and every individual and .automatic generation of user interfaces.
Abstracting User Interfaces
Abstracting user interfaces is an area of human-computer interaction (HCI) that focuses on designing interfaces in a way that separates the presentation of information from the underlying application functions. The idea is to allow for greater flexibility in presenting information to different users, on different devices, or in different contexts, without having to redesign the entire application. Progress in this area included ISO 24752 – a standard for a “Universal Remote Console” (URC). The URC was an effort to standardize and abstract user interfaces so they can be easily personalized and adapted to different user needs and devices. ISO 24752 provided a framework for defining user interfaces in a way that separates the presentation and interaction from the underlying function of a device or service. However, one of the limitations of 24752 was that it was impractical for every manufacturer and product designer to integrate a standard interface socket across their devices. Furthermore (and the primary reason that it failed) manufacturers were resistant to having someone “control our product while looking at someone else’s logo”. The Info-Bot/IUIG approach circumvents this and requires no API. However, the ISO 24752 work highlights the complexities of creating an abstract user interface socket – and something along this line may be required for communication between the Info-Bot and the IUIGs.
Advanced generative capabilities will be key for dynamic interface creation. As modern interfaces increasingly merge visual, auditory, and tactile elements, AI systems need to seamlessly integrate information from these multiple modalities. Fortunately, work is already underway along this line. However current AIs do not so much “understand” as extrapolate from what has already been presented to them in the data they are built from. Significant advances in AI as well as provision of specific types of information will be needed before it will be able to interpret and understand interfaces as well as the median human. And before they will generate alternate interfaces they may need many good examples of interfaces for each of the wide variety of interfaces for people with different types, degrees, and combinations of disability.
Understanding in Computer Vision
Beyond recognizing objects or elements, machine vision systems must understand the context. For instance, distinguishing between a volume slider and a scroll bar is not just about their shapes but understanding their roles within the interface. Machine vision needs to move beyond pixel-level analysis and even object recognition to a more semantic understanding, extracting the meaning or intent behind visual elements.
Local Artificial Intelligence
AI technology is currently heavily cloud-based. However, hardware advances are making it increasingly possible to run AI applications locally on devices such as smartphones, laptops, and even embedded systems. This is due to a number of factors, including more efficient processors and memory, new types of memory, such as neuromorphic memory, that are being developed specifically for AI applications, specialized AI accelerator chips and new software tools and frameworks: Software tools and frameworks are being developed to make it easier to develop and deploy AI applications on edge devices. Speech recognition for example, has steadily moved from being cloud-based to locally based with an accompanying increase in user privacy. Using local AI will be important to allowing users to benefit from its capabilities without compromising their personal data.
It will not be possible to create truly Individual User Interface Generators for each person manually. Also, user’s needs and abilities will change over time both increasing and decreasing. It will be important for IUIGs to be able to self-adapt over time based on interaction with users and in response to changes in its environment or to new information. This is beyond anything we have today. It is also important to determine how to do this without the user losing control over their interface.
User Interface (UI) Understanding
Understanding user interfaces (UI) is core to the development of InfoBot and IUIG. This involves applying computer vision and machine learning techniques to decipher the components of a UI without being able to delve into its underlying structure. For example, if it’s a graphical interface, the Info-Bot would have only pixels as input, and for a voice interface, it would only have audio waveforms. The ability to derive the user-interface intent and functionality from just these inputs is an essential first step.
Apple has already taken steps in this direction with a product called Screen Recognition. This tool, an extension to the VoiceOver screen reader, helps navigate apps, through Infrastructure mode (i.e., if an app has included its metadata) or Screen Recognition mode (i.e., if the app does not have metadata). Users can activate the Screen Recognition mode to identify on-screen elements, facilitating easier navigation. However, one challenge lies in not just recognizing these elements but understanding their intent. This will be important to reaching the next stage in UI understanding and matching it to user intent – the tasks users want to accomplish.
Another recent advance in this area is the “Never-ending UI Learner”. Currently, machine learning models primarily that are trained to predict semantic UI information depend on datasets containing human-annotated static screenshots. However, this process is both expensive and error-prone for specific tasks. For instance, when annotators have to determine whether a UI element can be “tapped” based on a screenshot, they must make educated guesses using visual cues alone. In response, Wu et al. have recently developed an automated mechanism to infer semantic properties of UIs. The Never-ending UI Learner is an app crawler that automatically examines apps obtained from a mobile app store and crawls them to infer semantic properties of UIs by interacting with UI elements, identifying and learning from different scenarios, and constantly updating the model based on this information .
Mapping User Intent into Actions
Unlike current assistants that often require a specific command, advanced personal agents need to feature a dialogic interface. In other words, the assistant would use conversation or shared dialogue to understand meaning. It would have the ability to monitor in near-real-time and use natural language processing (NLP) to understand and act upon a user’s intentions without them having to provide exact commands. For example, instead of saying, “Raise the temperature,” a user could express, “I’m feeling cold,” and the assistant would respond by asking if they wanted to turn on the furnace or heat source. The ability to simply express one’s feeling or state may need to be enough for the agent to decipher the necessary action to address some people’s intent.
With today’s technology, it is possible to map home automation technology to the appropriate assertions using artificial intelligence. Consider an example from a YouTube video: a tech enthusiast spent an afternoon configuring ChatGPT as a shortcut on his iOS. This setup allowed him to directly converse with ChatGPT, which then generated programmatic code to interact with his smart home. When he commented, “I just noticed I’m sitting in the dark in my office,” the system, without prompting or further interaction, turned on the lights. A better model would be for it to ask if the person wanted the lights on. But the key is the ability to identify user needs or intent without specific instruction. Such an intuitive response demonstrates the growing capacity of AI-driven systems to map user statements to appropriate actions directly . Given the trajectory of technological advancements, it’s reasonable to anticipate a steady growth in the level of automated comprehension and intent mapping that will be even more sophisticated, widespread, and commonly available.
We are already seeing systems that can handle tasks such as “summarize this chart for me” or “turn these bullet points into a well-structured presentation. In the future they will need to be capable of not just linguistic transformations but content-based transformations across various formats that are more accessible to some people.
In terms of knowledge queries, if one were to ask, “Which restaurant did Tom and I last dine at?” Based on context, the system might inquire and ask to clarify, “Which Tom are you referring to?” and if the response is “Walker,” the technology would recall that the last time you met with Tom Walker was approximately five years ago in Seattle, pinpointing the exact restaurant from your calendar or charge card history.
These functionalities, though possible to some extent today in a pre-scripted manner, are predicted to be available automatically in the future, and this has implications as an assist and augmentation tool for several groups of people with disabilities or functional limitations, such as dementia or memory concerns, difficulties with visual processing, or learning disabilities. People who may know what they want to accomplish with the product – but not understand how to operate its interface to achieve that result.
InfoBot and IUIG would need to integrate and process diverse data streams, such as visual (from screens or interfaces), auditory (sounds, voice commands), tactile (touchscreen or hardware button feedback), and potentially more. Having a framework that can seamlessly bring together these varied data forms is invaluable. Currently, Microsoft Research is working on a project titled “The Platform for Situated Intelligence,” an open-source framework that alleviates the engineering challenges that arise when developing systems and applications that process multimodal streaming sensor data (such as audio, video, depth, etc.). The framework is intended to make it easier for developers to build AI that can perceive, understand, and act in our world in real-time .
Understanding how to design an interface for each and every different individual – with their different type, degree, and combination of Disability and their different knowledge and technical ability.
Many of the above research needs fall in the “general ICT / AI research realm. However, much of the research needed to realize the Info-Bot/IUIG concept fall outside of mainstream research needs. They require advances specific to understanding disability and adaptive interfaces – particularly for under-addressed groups. Developing the InfoBot and IUIG will require a significant increase in our understanding of how to design effective interfaces for people with each and every type, degree, and combination of disability. This is especially important for people with multiple disabilities, as most assistive technologies (AT) are designed for a single disability. For example, some AT is designed for people with blindness, but they may not consider how people with blindness who also have another disability, such as deafness, or cerebral palsy, or cognitive disabilities would use their product. This can leave people with multiple disabilities without the assistive technology they need. The more important problem however is not that assistive technologies do not exist for all of these different individuals, but that we don’t actually know how to design products for people with all types, all degrees, and all combinations of disabilities. Much research is needed to define the best or even adequate approaches for all of the permutations and combinations. Only then can IUIGs be created to generate the interfaces these different individual need.
Automatic Generation of User Interfaces
Automatic generation of user interfaces is both an area of challenge and opportunity. Opportunity since the automatic generation of user interfaces is the core of the Individual User Interface Generators (IUIGs). Any advances in this area, disability-related or not, will advance the development of IUIGs. However, IUIGs for people with disabilities require the development of much more diverse user interfaces, with an order of magnitude fewer (or no) user interfaces to use as models for each different type, degree, and combination of disability.
- Technology is often designed for a nominal user, who is typically young, able-bodied, and has good vision and hearing.
- Despite increasing efforts to make technology more accessible, it is unlikely that we will ever reach a point where all products are accessible to everyone, out of the box.
- Today only a small subset of products (~3-5%) is accessible – and even these are only accessible to a subset of the range of disabilities and combinations of disabilities that people have.
- With new technologies rapidly being developed, and current product accessibility as low as 3%, it will be tremendously difficult to close the gap if we only rely on traditional approaches that require accessibility be built-in or require access to the infrastructure layer of products.
- Even where there are accessibility features available, they are not always easy to use or understand, and some people with disabilities are excluded from using technology simply for this reason.
- In order to make technology more accessible to everyone, we need to include a new approach that provides individual interfaces that each person can understand and use, rather than demanding that individuals adapt to the interface(s) on products designed for people with a quite different mix of abilities.
- The InfoBot & IUIG would dynamically generate accessible interfaces based on the user’s needs, cognitive abilities, mobility, sensory perceptions, and preferences without requiring any access to the internals of products.
- This approach has a number of advantages for users and industry, including achieving near-total accessibility, standardizing a mental model, reducing industry burden, and increasing scalability and profit.
The InfoBot and IUIG have the potential to both address accessibility problems we are not able to address today for groups we do not well serve today, as well as helping to meet some of tomorrow’s interface challenges. It also could provide a path for users, for the first time, to be able to access all of the products that are created today without any accessibility. Even when required by law, accessibility is often not provided – but the Info-Bot/IUIG approach could still provide access after the fact. This approach also eliminates the problem that arises when the built-in accessibility is broken – but not repaired (something that all too common).
The feasibility, practicality, and limitations of such an approach however are currently unknown – and a concerted R&D effort will be needed to explore this. This approach will also require the development of a new social contract between product developers and consumers. It could have significant impact on policies and regulations around accessibility. Although the fundamental policies would not change, the implementing policies and regulations would need to change. Fortunately, it appears that it would simplify both the regulation and enforcement aspects. In addition, it would help address some of the emerging unsolved problems in accessibility regulation around both closed products and immersive environments to name two. Further exploration of the concept is underway. For more information see https://info-bot.org.
This chapter draws its content in part from work funded by the National Institute on Disability, Independent Living and Rehabilitation Research at the Administration for Community Living, U.S. Dept. of Health and Human Services Grant #90REGE0008; and from work funded by the National Science Foundation Grant #IIS2312370. The opinions herein are those of the author and not necessarily those of the funding agencies and no endorsement of the funding agencies should be assumed.