Event Data Standards: Building Community

by Laure Haak

For a game to be used for learning research, data on how it is played needs to be collected.   Event data is the raw information that comes out of a game as players interact with it. The actions that players take within a game have meaning, both to the game world and potentially for understanding how we learn. “If you want to do analyses or make assessments or measurements from what players are doing within games, we need to make sure we've captured the raw material that downstream analytic processes can actually use,” explained Erik Harpstead, PhD, Senior Systems Scientist and Faculty in the Human Computer Interaction Institute at Carnegie Mellon University. 

Balancing Breadth and Depth

Erik, together with Luke Swanson of Field Day Lab at the University of Wisconsin, and Jeci Younger, PhD,  Senior Manager of Research Products at PBS KIDS, have been pooling their experience designing, fielding, and analyzing learning games to develop a shared schema for game event data.  “One of the strengths of this group is the variety of different game types we have experience with,” said Jeci. “We can try to include in the schema as much of that picture as we can, and also leave some room to add more.” 

Erik Harpstead

They found the existing event data schema, such as Google Analytics, were either too high level or focused on monetization to be useful in learning games contexts. Because of this, research labs were often left to create bespoke schema, hampering data sharing and comparative analyses. The standards work is an effort to develop a flexible and extensible schema for the learning games community that can foster cross-team collaboration and sharing not just of data but analytics tools such as processing and visualization code. Their work builds on a schema published by Field Day [add reference], and blends structure and features from the PBS KIDS approach to event data, serious game dialects of Experience API, as well as CMU learning technology platforms including Datashop.  “So, for example,” said Erik, “when Jeci talked about the PBS KIDS approach and mentioned they use four-digit codes for different event categories, we adopted that concept immediately.” A common meaning for individual code numbers has proven very useful for both understandability and extensibility of the schema across different game designs.

Schema Features

Event data represents a dialogue between the player and the system. The schema work is aimed at enabling capture of player and system response events ranging from finishing a puzzle, reaching a “success state”,  segmenting and tracking progress through different parts of a game, system-wide changes like score updates, and logging actions from multiple players. Not every game is the same, and detailed event logging allows researchers to contextualize what a player did. “A big part of the philosophical and theoretical side of this way of doing logging is that capturing both what the player actually did and the results of those actions provides a foundation for understanding more qualitative things – what did the player actually accomplish – and what might that represent about their learning,” Luke said.

This approach enables nuanced analytic approaches. There are two broad categories of learning game analytics: is a player doing the thing the game is designed around – usually measured by a score, and what are players doing and how they feel about it.  “So, rather than assuming the game is correct and well aligned with its learning objectives,” explained Erik, "With the new schema you can instead look at the broad patterns of player behavior and start to tie those to how might we design the game differently such that this branch isn't possible or is less desirable, and get more people onto the learning path.”  Jeci added, “The reason for capturing such rich data is so that we can understand why a player did something, and why the game works or not. A drawback of other event logging systems is that they just report, did you get the answer right, yes or no?  We care about context.  It’s like, in a math test where you get points for doing the steps right even if you write the wrong answer at the end.  As learning researchers, we care more about how you got to your answer than the final grade.”

A core feature of the schema is the ability to track actions of an anonymous player.  Instead of collecting personal information about players, the schema enables the use of player codes.  That means gameplay of an individual can be assessed over multiple rounds of play, multiple logins, and in some cases, across games.  This is particularly important in educational settings where kids are playing learning games.  Their privacy is paramount.  Using player codes means the event data can be shared publicly with no potential for leaking private information.

In addition to player codes, the schema includes fields for device type, session, time, sequence within a session, version, game state, game configuration, and context.  Is the player selecting an item from a list – which one?  Is the player interacting with a game character - which one and how?  With the standard codes, it becomes possible to sort and filter data based on a set of parameters. “We're trying to design the schema so that you can work with different pieces of the event logging pie, but still have all the relevant information on every single event that you work with,” Jeci clarified. 

Sharing the Standard

The current focus is developing a common schema that can serve as the backbone for learning games research.   “Our intention is to put together our best effort at a standard schema that covers a breadth of learning games contexts,” said Luke,  “and then take that back to the research community for feedback.”  The hope is that a shared schema and related code will improve efficiencies and analysis.

They are already thinking about how to encourage adoption.  One aspect is designing a modular schema that can be adapted to many learning games contexts, research questions, audiences, and game formats.  In addition to gathering community input, they are considering informal and formal modes of publication from conference presentations to more formal standards bodies, as well as documentation and training.  They are also thinking about translation layers to interconnect data from existing schema and creating libraries that would work in game engines as well as analytics tools that utilize the schema. 

The modular schema also provides a path for researchers to extend it by contributing components and even building shared analysis tools based on the schema.  Ultimately, the goal is to distribute the standard using an open data infrastructure and build a community of practice around shared event data through an open community engagement process. 

Further Reading

D. J. Gagnon, L. Swanson and E. Harpstead, "Open Game Data: Defining a Pipeline and Standards for Educational Data Mining and Learning Analytics with Video Game Data," 2024 IEEE Conference on Games (CoG), Milan, Italy, 2024, pp. 1-8. 10.1109/CoG60054.2024.10645573.


This article is part of a series on pilot projects carried out by the Field Day team in collaboration with internal and external partners, to explore the opportunities and benefits of creating shared approaches to learning games, through standards, processes, and infrastructure.  This work was supported by the National Science Foundation, Grant # 2243668

Previous
Previous

Progression Visualization: Co-design of analytics tools using shared data

Next
Next

Data-Driven Studio Decisions