Stanford Seminar - If We Can Make It There: Notes On Urban Interaction

12 Oct 2024 (9 months ago)

Introduction and Cornell Tech

The speaker is a Silicon Valley native, having grown up in San Jose's Almaden Valley, and later moved to New York City in 2018 to join Cornell Tech (10s).
Cornell Tech is a graduate school and research center for Cornell University, established after winning a competition to build an applied science campus on Roosevelt Island in New York City (1m33s).
The campus is located on Roosevelt Island, which is officially part of Manhattan but is somewhat isolated, and is situated between Manhattan and Queens, with the Queensboro Bridge in the backdrop (1m26s).
The mission of the campus is to emphasize applying technology to real-world challenges and engaging with external partners, which was partly spurred by the 2008 recession and the need for economic diversification in New York City (2m6s).
New York City is now second in the world for startups and new ventures, and Cornell Tech is a small part of this development (2m20s).
The speaker will be discussing three lines of research from their current program that fit under the theme of urban interaction, including trash bots in the city, urban fingerprinting, and communal extended reality (2m34s).

Trash Bots in New York City

The research projects are ongoing and do not have a clear end, allowing the speaker to explore exciting ideas without a set outcome (2m47s).
Some of the research threads date back to the speaker's days at Stanford, and they are excited to share their work (2m59s).
The speaker will be showing a video from work done in 2014-2015 at Stanford, which involves phenomenological experiments to understand how people interact with robots in the future (3m12s).
Experiments were conducted to understand how people interact with machines, specifically a robot that resembles a trash barrel, to show that people interact socially with machines regardless of their appearance (6m20s).
The experiments revealed that people engage in various social interactions with the robot, including negotiating and explaining the preconditions for interaction, even when there is no actual interaction (6m55s).
The mental models people have for the robot were also observed, with some individuals believing the robot has a desire to "eat" garbage, despite its obvious lack of a digestive system (7m45s).
The experiments also showed that people use a social interaction model when interacting with the robot, but not a human social interaction model, as they expect to be thanked or acknowledged when giving the robot something (8m36s).
The experiments were conducted in a controlled environment, but the researchers also wanted to see if the results would be the same in a real-world setting, such as in New York City (8m54s).
A separate study was conducted in Manhattan, New York City, where two robots were used to separate landfill and recycling, and the interactions between people and the robots were observed in a public space (9m21s).
The study area in Manhattan was a trapezium-shaped traffic island with metal tables and chairs for public seating, providing a unique setting for observing human-robot interactions (9m28s).
Observations of people's interactions with robots in public spaces show that they often treat the robots as if they are autonomous, even when they are being remotely controlled, and may talk to them as if they are dogs or learning entities (9m33s).
People may make assumptions about the robots' abilities, such as thinking they can be trained or that they are learning, even when this is not the case (12m21s).
Field experiments in public spaces can be useful for studying human-robot interactions, but they also come with challenges, such as the inability to control the environment or prevent repeat interactions (12m45s).
Social media can play a significant role in shaping people's perceptions of robots, with some individuals seeking out robots after seeing videos or posts about them online (13m11s).
A video of New York City Mayor Eric Adams interacting with a robot trash collector was shared on TikTok, highlighting the public's interest in these types of robots and the questions they raise about their use and funding (13m25s).
When interacting with robots, people often ask questions about who is responsible for the robot and its purpose, rather than how the robot works (14m25s).
The comments on the TikTok video raised issues that were not anticipated, such as concerns about the cost and effectiveness of the robot trash collectors, particularly in the context of other problems facing New York City (14m51s).
People often express skepticism about the feasibility of certain projects in urban areas, citing the unique challenges of neighborhoods like the Bronx, but every neighborhood has its own character and potential for innovation (15m9s).
A deployment was successfully conducted in downtown Brooklyn, and footage from the event shows people interacting with robots in various ways, including negotiating and explaining things to the robots (15m49s).
The interactions between humans and robots can be complex, with people sometimes becoming frustrated or annoyed if the robots do not understand or respond as expected (19m4s).
In some cases, people may expect the robots to remember them from previous interactions and become annoyed if the robots do not recall their previous conversations (18m54s).
The footage also shows people teasing and interacting with the robots in a playful way, and the robots are able to learn and adapt to the interactions (20m22s).
The robots are also able to assist with tasks such as cleaning up trash, and people are willing to help the robots and provide feedback (20m38s).
The project involves collaboration with ethnographers and linguists to analyze the footage and understand the interactions between humans and robots (20m53s).
Future deployments are planned for other boroughs, including Queens and the Bronx (21m0s).
Simple machines can elicit rich interactions, and the way these interactions vary across different contexts is a fascinating topic for designers to explore and understand (21m4s).
People constantly signal to robots what's expected of them, which highlights the self-perpetuating nature of culture and how it influences interactions (21m31s).
Culture plays a significant role in shaping interactions, and technologists often overlook its impact when introducing new technologies, assuming everyone will adapt uniformly (22m9s).
Autonomous cars are an example of how cultural differences can affect the adoption and use of technology (22m25s).
Low-cost, easily replicable methods, probes, and data sharing can help researchers understand cultural differences and their impact on interactions (22m27s).
Researchers are working on developing methods to share data while preserving participants' privacy, allowing for a deeper understanding of how interactions vary across different places (23m0s).
Cities are a great place to study culture, and urban fingerprinting is a topic that involves analyzing the unique characteristics of cities (23m10s).
Urban fingerprinting can be done at different levels, from the sidewalk level to a broader, map-level view of cities (23m23s).
A paper on urban fingerprinting, which uses mobile dash cams to analyze traffic patterns, was recently presented at the Automotive UI conference (23m54s).
The concept of urban fingerprinting was developed by students during the pandemic, who referred to it as a way to understand the "Citywide Vibes" (24m12s).
Barry Brown, a partner at Copenhagen University, has replicated a machine and run experiments with his students to study cultural differences in interactions (22m47s).
Researchers were studying social distancing during the COVID-19 pandemic by instrumenting vehicles to observe how people interact on the streets and using computer vision to analyze the data (24m26s).
They used a program called Mapillary, which was later purchased by Facebook/Meta, to aggregate their data and share it with others who were also collecting data from dash cams, bicycles, and pedestrians (25m25s).
The researchers also scraped data from New York City's traffic cameras to validate their own data and get closer to ground truth (25m57s).
A company called Nexar, which makes network dash cameras used in many Ubers, Lyft, and taxi cabs in New York City, can provide the equivalent of Google Street View 8 to 10 times a day (26m9s).
By analyzing Nexar's data using large-scale computer vision recognition, researchers can see different things happening in the city, such as the presence of police vehicles (26m33s).
PhD student Matt Franchi trained a model to recognize police vehicles in the data and used it to see where there are more and less police cars in the city (26m40s).
The analysis found that there are more police cars in commercial areas than in residential areas, and more police presence in low-income neighborhoods and Black and Hispanic neighborhoods (27m18s).
The researchers had to develop methods to ensure good sampling of the data, as the sampling rates of the dash cams were unknown and considered a corporate secret (27m7s).
The findings, while not surprising, provide quantitative evidence of the distribution of police presence in the city, which can be difficult to obtain from the people being studied (27m46s).
Sidewalk-level information about cities can be obtained through alternative methods, making it possible to gather data on what people care about in urban areas (27m50s).
In New York City, building owners are liable for any harm caused by objects falling from their buildings, leading to the use of scaffolds that can remain up for decades, creating opportunities for crime and garbage (28m0s).
By analyzing data, researchers were able to identify unpermitted scaffolding in New York City by comparing recognizable scaffolding in the data set to the city's Department of Buildings records (28m35s).
This information can be used by building inspectors to identify problem areas and address issues, with the data still available on archive (28m49s).
Researchers are exploring the use of this method to "fingerprint" cities, characterizing how one city is similar to or different from another, despite sampling issues (29m5s).
By comparing data from New York City and San Francisco, researchers found differences in image density, largely due to the number of instrumented ride-share vehicles in each city (29m34s).
This highlights the potential for cities to collect their own urban data sets using instrumented municipal vehicles, such as buses, to gather information about the city (30m8s).
The takeaway from this research is that large urban image data sets can help map urban-scale trends while preserving the ability to audit the data, making it possible to characterize key differences between cities (30m18s).
The goal is to use these "fingerprints" to contextualize findings and expectations for generalizability across cities, which has long been a challenge (30m51s).
Research has been conducted on urban scale social science, focusing on New York due to the availability of data, to determine how much information can be transferred and applied to other cities (30m59s).
Anonymized data sets are available, and researchers are encouraging others to contribute to the study and build upon the existing data (31m12s).
A thread of research revolves around communal extended reality, exploring how people interact with automated vehicles and utilizing virtual reality simulation as a tool (31m21s).
Virtual reality is not preferred as a consumer product due to its limitations in replicating the physical world, but it has proven to be a valuable tool for research (31m45s).
The study of human interaction is crucial, and researchers have found that people often cannot accurately express their thoughts on future scenarios, making it necessary to use prototypes to gather reactions (32m15s).
Simulation is used to prototype scenarios, environments, and design interventions at the city scale, allowing users to react naturally to possible situations (32m44s).
A study was conducted using a virtual driving simulation to examine how people interact with each other in intersections, with 170 participants and two locations (33m32s).
The study focused on social situational awareness of drivers during encounters at intersections, measuring factors such as direction of approach, signaling, speed, and speed change (33m36s).
The results showed that sequential perception and understanding of explicit and implicit cues, such as signaling and speed change, are highly related to awareness of who entered the intersection first (34m9s).
The findings have implications for driver-to-driver communication, negotiations, and the design of automated vehicles that are situationally aware of inter-driver social aspects (34m27s).
A method is being used to collect movement patterns, which can be used to create movement models that could be applied to programming autonomous vehicles (AVs) to recognize the communicative intent of another car's movement (34m47s).
The movement models can also be used to determine how an AV should move to communicate its intentions to other drivers or pedestrians (35m1s).
Research is being conducted on how drivers and pedestrians interact, with a focus on how pedestrians outside of cars interact with AVs, as they did not sign up for this future with autonomous vehicles (35m24s).
The research aims to create vehicles that take into account the way people move their bodies and signal when they're going to go or not go (35m37s).
A demo was presented at Auto UI, showing a street crossing scenario where a pedestrian seems to come out from behind a vehicle, and the different perspectives of the driver and pedestrian (35m43s).
The research uses motion capture to see how people gesture, gaze, and look, and how they use their bodies differently in different places (36m50s).
The study found that people perform differently in different cities and neighborhoods, with some places having a norm of not signaling intentions to cars (37m8s).
The research also found that people negotiate intersections differently, and that these differences are important to consider when designing autonomous vehicles (37m26s).
Simulation is being used to study how people interact in different scenarios, including on-road and in-lab scenarios (37m41s).
A driving simulation infrastructure named Portabello has been developed, which allows drivers or passengers to see virtual objects overlaid on top of their real-world view (37m51s).
Portabello makes it possible to port studies developed for in-lab simulators to be run on-road, achieving platform portability (38m15s).
Localization technologies from robotics and autonomous vehicle research are used to stage virtual events in the world frame, utilizing a liar, IMU, and infrared tracking to localize the vehicle and participants' head movement within the vehicle (38m27s).
A study on autonomous vehicle interaction was conducted in both an on-road driving simulator and an in-lab driving simulator, allowing for the porting of the study design and validation of the system (38m45s).
The system enables people to see virtual avatars exactly where they were programmed in their corresponding location in the real world, making it possible to create portable study designs (39m7s).
The workflow to make platform-portable study designs requires mapping the real-world course environment by scanning the test area and obtaining a high-resolution point cloud map of the environment (39m15s).
The point cloud is then imported into Unity, allowing the in-lab simulation environment to be built as a digital twin of the real-world course, and study events can be staged on the map at runtime (39m27s).
The real-time navigation system is used to locate the vehicle position and update the headset camera location in Unity (39m40s).
The goal is to run a pedestrian drive interaction study in a physical environment, but the technical aspects of achieving this are still being figured out (39m52s).
The communal extended reality (cxr) system enables a group of people to use Virtual Reality headsets inside a moving vehicle and watch the exact same route in a virtual digital twin (40m41s).
The cxr system was used to simulate flood and climate change scenarios, and it is believed to have many other implications, such as engaging communities in an embodied and intuitive way (40m47s).
The Rosevelt Island digital twin was created based on open data from the New York City Department of City Planning and Google Earth, and it includes digitally modeled buildings, key objects, and a dynamic environment system (41m21s).
The digital twin was used to visually narrate nine climate change scenarios immersively to participants, and the study involved groups recruited from Roosevelt Island who participated in pre-ride focus groups (41m51s).
A study was conducted in New York where participants rode a bus and then participated in a focus group to discuss their concerns about flooding, with the pre-ride thoughts being abstract and general, and the post-ride concerns being more specific and nervous (42m23s).
The study found that having a common shared experience of possible futures makes it possible to have conversations about concrete things the community needs to do to plan for those futures (43m2s).
Simulation is a great way to understand real reactions to potential futures, and shared experiences pave the way for understanding interaction and generating resolve for action (43m30s).
The use of low-cost, easily replicable methods and data collection makes it possible to appreciate the depth and breadth of community-level differences (43m46s).
The researchers are sharing their digital models and hope others will be excited to do similar projects, such as comparing digital cities with the same infrastructure (43m57s).
HCI (Human-Computer Interaction) brings an appreciation for on-the-ground, in-person, everyday experiences to urban planning, as well as a willingness to invent new tools to solve age-old problems (44m13s).
HCI also brings a determination to solve problems with people, which is something that can be applied to problems in cities like New York (44m47s).
The Design Tech department at a university has a search for a new position in interaction design with cutting-edge technologies, and they will be hiring soon (45m2s).
A question was asked about the robot project, specifically about how the team designed the study and set parameters for the robot, but the answer was not provided in the given text (45m31s).
In the trash bill robot study, human teleoperators, also known as "wizards," control the robots and have a line of sight to them, which helps reduce dramatic tensions that may arise from non-line of sight situations (46m50s).
The wizards can see the footage in real-time but often rely on their line of sight, only looking at the footage when they are too far away to see the action (47m8s).
Each wizard drives the robot differently, and their personality can be observed in the footage, but this aspect was not controlled for in the study (47m19s).
The wizards were given a high-level mission to collect garbage and behave in a socially appropriate manner, with no specific instructions on how to interact with people (47m33s).
The study did not control for the personality of the wizards or the people interacting with the robot, and instead focused on characterizing interactions at an interaction level (48m7s).
When setting up the trash cans and robot, people sometimes noticed and came over to ask questions, but this was not a significant issue and was considered part of the imperfect and noisy nature of the study (48m31s).
The setup process was relatively elegant and quick, with the computer and other equipment being attached to the trash barrel or base, and sometimes people had conversations with the wizards during this time (48m43s).
Research has shown that people who understand they are in a "wizard of oz" study do not behave differently than those who do not know, which is relevant to the study's methodology (49m32s).
People tend to interact with robots in a way that is similar to how they interact with other people, animals, and even inanimate objects, often trying to enforce social norms and pragmatics, especially when they feel that the robot is disrupting the local culture (50m23s).
This behavior is not unique to humans, as animals also engage in similar negotiations to coexist in a shared space, and humans are particularly skilled at doing this without even thinking about it (51m52s).
The desire to enforce cultural norms and stick to established rules is a strong motivator for people's behavior, which can sometimes lead to resistance against new technologies that disrupt the status quo (52m4s).
Observations of people interacting with a robot trash can in New York City showed that they often treated it like a dog, giving it instructions and expecting it to behave in a certain way, which raises questions about how people's reactions might change if the robot were designed for a different task, such as crowd control (52m22s).
The context in which the robot is introduced can also influence people's reactions, such as the presence of other robots in the city, like the Boston Dynamics robot dogs used by the NYPD in 2022, which may have contributed to a theme of policing and robots in the city (52m57s).
People's perception of robots can be influenced by their appearance and the institution they represent, as seen in an experiment where a robot shaped like a dog was treated with affection, while a robot resembling a garbage can was not, despite both being designed to perform a friendly service (53m10s).
The "dog metaphor" is often used by people to speculate about robots and their intentions, but there may be alternative metaphors that could be explored in a controlled lab study (54m22s).

Urban Fingerprinting

The use of Wizard of Oz robotic prototypes can help project what a future with full automation might look like, but it's also possible that the future of robotics could involve a combination of human and robotic elements, creating a "cyborg" effect (54m35s).
The idea of a future with fully autonomous robots is not necessarily the only possibility, and it's likely that humans will continue to play a role in the development and operation of robots, even if they are designed to appear autonomous (55m30s).
The book "Ghost Work" highlights the fact that many automated systems, including vehicles, often have human operators behind the scenes, and this could be a model for the future of robotics (55m44s).
Research has shown that people are more interested in who a robot is for, rather than how it works, and this could be an important consideration in the design and development of robots (56m7s).
The concept of autonomous cars and their potential impact on urban interaction is discussed, with the idea that people may be more interested in the presence of SFPD cars than the underlying system controlling the autonomous vehicles (56m18s).
The work being done in this field is often mistaken for entertainment or theater spectacle, but it is actually more akin to community improv, aiming to elicit reactions and engage with people in a different way (56m31s).
The challenge lies in framing this work in a way that makes sense and conveys its true purpose, which is an ongoing process (56m55s).
There are different types of people working in the field of Human-Computer Interaction (HCI), including problem solvers who focus on fixing specific issues and planners who take a more holistic approach (57m5s).
Theorists like Terry distinguish between bottom-up and top-down approaches, with the latter being referred to as "The Visionaries" (57m25s).
The speaker identifies as an "inside out" person, questioning the current design methodology and seeking ways to do things differently (57m33s).
The goal is to develop a new methodology that can engage more people and change the way design is approached (57m46s).