Overview

We research digital archives of historical materials and methodologies to identify valuable insights from data.

Mining for Value Concealed within Data

Nowadays, a huge volume of digital data is gathered from both natural and artificial sources; for example, weather and seismic monitoring data, human and vehicle mobility data, and social activity data such as business transactions and medical care records. Digitally archiving historic documents and records at risk of dissipation also produces digital data. In these cases, digitization secures and enhances the value of knowledge by making it accessible regardless of physical distance. New digital data is also frequently being created, such as web pages, social networks, and academic papers.

Advances in data analysis and modeling techniques, most notably machine learning, enable us to extract more meaningful and interpretable information from data and networking technology. This makes it possible to combine information from various sources. Data science is about turning raw data from a stream of digits into valuable insights and knowledge. Data science is also closely related to advances in high performance computing technologies, including high performance processors, storage and networking, big data analytics, deep learning numerical algorithms, and so on.

The Data Science Research Division was established at the end of 2018, replacing the Academic Information Science Research Division. In addition to research on data science, the division will lead the designing and building of a national infrastructure for the data science research community. It will also continue to collaborate closely with the General Library on the digital archiving project and scholarly database services.

Research Subjects

Machine Learning on Big Human Mobility Data

Our research integrates and analyzes people’s locations, obtained from mobile phone location data, with digitized urban transportation network data. We combine these data with next-generation AI technology (deep learning, reinforcement learning, ensemble learning, etc.) to predict the flow of people moment by moment using numerous means of transportation. The predicted information can be useful for various purposes such as transportation system control, emergency management, aid in the event of a disaster, strategies for preventing the spread of infectious diseases, and allocating medical resources. We are particularly focused on modeling and simulation techniques that will enable this sort of prediction.

Large-Scale Graph Neural Networks

All entities in digital space and the real world –including objects, facts, and human beings– and their relationships can be represented as nodes and edges, leading to large-scale dynamic graphs in graph theory. We work on graph neural networks or GNNs that can learn graph structures and the roles of nodes and edges via deep learning. We target various GNN applications including recommender systems in e-commerce, news platforms, transportation as well as fraud detection in financial systems. We have also been investigating Materials Informatics, the interdisciplinary research field between data science/machine learning and materials science/engineering. We are working on developing an effective method to store big materials data, such as the theoretical results of physical simulations and real data from experimental instruments. Moreover, we are using these data to develop machine learning methods for material property predictions which use GNNs to effectively handle molecular graphs.

mdx: a platform for building data-empowered society

mdx aims to be a platform that provides the functionality to collect, store, and analyze data that can then be rapidly created, extended, and integrated on demand for specific uses.

Collaboration between multiple fields and sectors is required to be able to utilize data and apply it to benefit society at large. There is a need to bring together organizations that possess and provide data, particularly corporations and research institutions in specific fields; experts in various fields needed to resolve issues; and specialists in information and data sciences, including in programming, algorithms and machine learning. As a major step toward that end, mdx aims to facilitate collaboration between universities, national research institutions, industry, and the government.

Using virtualization technology, mdx provides private environments (virtual platforms) that are unique to each project. A private virtual environment can be created and configured individually for each project in a flexible manner, making it possible to install the necessary software stack for the project. mdx utilizes the Japanese academic backbone network called the Science Information NETwork (SINET) and generates output from input in real time by connecting remote sensors and storage devices with the computational resources on the data platform. A community of data owners, analysts, and users in a variety of fields who collaborate across the boundaries of academia and industry is formed to create new values.

As of FY2024, mdx is jointly operated by nine universities and two research institutes, led by the Information Technology Center at the University of Tokyo.

mdx web page