Dissecting embedding methods: learning higher-order structures from data
报告人简介
Liubov Tupikina is a senior researcher in computer science, mathematics and physics of complex systems. She now works on embeddings theory, low-dimensional data representations, higher-order mathematical structures representations of data encoded systems and hypergraphs encoding using algebraic theory (broadly explainable AI area).
内容简介
Active area of research in AI is the theory of manifold learning and finding lower- dimensional manifold representation on how we can learn geometry from data for providing better quality curated datasets. There are however various issues with these methods related to finding low-dimensional data representation of the data, the so-called curse of dimensionality. Geometric deep learning methods for data learning often include a set of assumptions on the geometry of the feature space. Some of these assumptions include pre-selected metrics on the feature space, usage of the underlying graph structure, which encodes the data points proximity. However, the later assumption of using a graph as the underlying discrete structure, encodes only the binary pair- wise relations between data points, restricting ourselves from capturing more complex higher-order relationships, which are often present in various systems. These assumptions on the data together with data being discrete and finite may cause some generalisation, which may create wrong interpretations of the data and models, which produce the embeddings of data itself (such as BERT and others). The objective of our this talk will be to talk about several aspects of extraction of higher-order information from data. We will first talk about how to characterize the accuracy measure of the embedding methods using the higher-order structures. For this we explore the underlying graph assumption substituting it with the hypergraph structures. Second we aim to demonstrate the embedding characterization on the usecase of the example of some data with higher-order relations (such as arXiv open data).