All hail!
I am Giulia Preti.
I am a Researcher in the fields of Graph Mining and Complex Network Analysis at CENTAI, Turin (Italy).
I was a Post Doc in the area of Learning and Algorithms for Data Analytics at ISI Foundation, Turin (Italy), under the coordination of Francesco Bonchi.
I was a member of the DbTrento research group, and I worked on techniques for mining relevant structures in dynamic and heterogeneous datasets.
I got my PhD in Information and Communication Technology at the University of Trento (Italy), under the supervision of Prof. Yannis Velegrakis.
I was also the teaching assistant for the Computability and Computational Complexity course from 2015 to 2018.
I got my master's degree in Computer Science and my bachelor's degree in mathematics, both of which pursued at the University of Trento.
[Download CV]
Research Interests:
My research focuses on graphs, a versatile data model that has been increasingly used to represent a large plethora of data, from biology to social networks, and from computer networks to smart cities. In particular, I consider weighted graphs and dynamic graphs.
Weighted graphs are graphs whose nodes and edges are labeled with weights indicating their relevance or quality. Moreover, in applications aiming at offering personalized products and services to each user rather than ``one size fits all'' solutions, each element of the graph naturally carries multiple weights, one for each user. My goal is to identify structures that appear frequently in the graph and whose appearances are characterized by large weights, and hence are relevant for the user, under the assumption that larger weights indicate higher interest.
Dynamic graphs are graphs that change over time, meaning that their nodes and edges can undergo both structural and attribute changes. They are generally modeled as sequences of static graphs called snapshots. In this context, I am interested in detecting groups of edges that evolve in a convergent manner, meaning that they display a positive correlation in their behavior. These groups of correlated edges, especially when they involve topologically close edges, can represent regions of interest in the network.
More recently, I started investigating pattern mining techniques and null models for complex networks such as hypergraphs and simplicial complexes, which are higherorder generalizations of graphs able to model kary relations among entities. Hypergraphs are collections of hyperedges, which are edges that connect more than two vertices. Conversely, a simplicial complex is a collection of polytopes such as triangles and tetrahedra, which are called simplices.
During my PhD studies, I also worked on entity resolution in highly heterogeneous and temporal databases, defined as collections of records characterized by different schemata and timestamps indicating the date of creation. The reconciliation of the records in this kind of situation requires specialized similarity functions that take into consideration both the heterogeneity and the dynamism of the data. In my work, I proposed a suitable timeaware schemaagnostic similarity measure and a framework that uses this measure to identify maximal groups of similar temporal records.
Publications:

Impossibility result for Markov chain Monte Carlo sampling from microcanonical bipartite graph ensembles.
Giulia Preti, Gianmarco De Francisci Morales, Matteo Riondato,
Physical Review E, 2024
10.1103/PhysRevE.109.L053301 
Hyperdistance Oracles in Hypergraphs.
Giulia Preti, Gianmarco De Francisci Morales, Francesco Bonchi,
The VLDB Journal, 2024
(PDF) 
MaNIACS: Approximate Mining of Frequent Patterns through Sampling.
Giulia Preti, Gianmarco De Francisci Morales, Matteo Riondato,
TIST, 2023
10.1145/3587254 
ALICE and the Caterpillar: A More Descriptive Null Model for Assessing Data Mining Results.
Giulia Preti, Gianmarco De Francisci Morales, Matteo Riondato,
ICDM, 2022
(PDF) 
FreSCo: Mining Frequent Patterns in Simplicial Complexes.
Giulia Preti, Gianmarco De Francisci Morales, Francesco Bonchi,
The Web Conference, 2022
(PDF) 
MaNIACS: Approximate Mining of Frequent Patterns through Sampling.
Giulia Preti, Gianmarco De Francisci Morales, Matteo Riondato,
KDD, 2021
(PDF) 
Discovering Dense Correlated Subgraphs in Dynamic Networks.
Giulia Preti, Polina Rozenshtein, Aristides Gionis, Yannis Velegrakis
PAKDD, 2021
(PDF) 
STruD: Truss Decomposition of Simplicial Complexes.
Giulia Preti, Gianmarco De Francisci Morales, Francesco Bonchi
The Web Conference, 2021
(PDF) 
Mining Dense Subgraphs with Similar Edges.
Polina Rozenshtein, Giulia Preti, Aristides Gionis, Yannis Velegrakis
ECMLPKDD, 2020
(PDF) 
ExCoDE: a Tool for Discovering and Visualizing Regions of Correlation in Dynamic Networks.
Giulia Preti, Polina Rozenshtein, Aristides Gionis, Yannis Velegrakis
ICDM Workshops, 2019
(PDF) (poster) 
Mining Patterns in Graphs with Multiple Weights.
Giulia Preti, Matteo Lissandrini, Davide Mottin, Yannis Velegrakis
Distributed and Parallel Databases Journal, 139, 2019
https://doi.org/10.1007/s1061901907259w 
Beyond Frequencies: Graph Pattern Mining in Multiweighted Graphs.
Giulia Preti, Matteo Lissandrini, Davide Mottin, Yannis Velegrakis
EDBT, 2018
(PDF) (PPT)
Projects and Code:

HypED
Overview
HypED is an algorithm to answer pointtopoint sdistance queries in hypergraphs, approximately. The algorithm can answer three types of queries: vertextohyperedge, vertextovertex, and hyperedgetohyperedge. This is achieved by constructing a distance oracle, which can be stored on disk for future usage.
The distance oracle stores distances from landmark hyperedges to reachable hyperedges, so that the distance between two hyperedges can be approximated via triangle inequalities. The algorithm requires in input an integer L used to compute the desired oracle size O = L x E, where E is the number of hyperedges in the hypergraph.
Code
The code of this project is publicly available on GitHub.

ALICE
Overview
ALICE is a suite of two MarkovChain MonteCarlo algorithms for sampling datasets from our novel null model, based on a carefully defined set of states and efficient operations to move between them.
This null model preserves the Bipartite Joint Degree Matrix of the bipartite graph corresponding to the dataset, which ensures that the number of caterpillars, i.e., paths of length three, is preserved, in addition to the item supports and the transaction lengths.
ALICEA is based on Restricted Swap Operations (RSOs) on biadjacency matrices, which preserve the BJDM. ALICEB adapts the CURVEBALL approach to RSOs, to essentially perform multiple RSOs at every step, thus leading to faster mixing.
Code
The code of this project is publicly available on GitHub.

FreSCo
Overview
FreSCo is an algorithm to find frequent patterns in simplicial complexes. A pattern, or simplet is defined as a subcomplex, and its frequency is determined by its occurrences in the complex.
The algorithm generalizes the Minimum Node Imagebased (MNI) frequency measure to the complex setting and then returns the set of patterns whose frequency is greater than a given threshold.
Code
The code of this project is publicly available on GitHub.

MaNIACS
Overview
MaNIACS is a samplingbased randomized algorithm for computing approximations of the collection of the subgraph patterns that are frequent in a single vertexlabeled graph, according to the Minimum Node Imagebased (MNI) frequency measure.
The output of MaNIACS comes with strong probabilistic guarantees. The quality of the approximation is obtained using the empirical VapnikChervonenkis (VC) dimension, a key concept from statistical learning theory. In particular, given a failure probability, a frequency threshold, and a sample size, with at least such probability over the choice of the sample of such size, the output of MaNIACS contains each pattern of size k with relative MNI frequency greater than the threshold and with estimated frequency within epsilon from the relative MNI frequency.
MaNIACS leverages properties of the frequency function to aggressively prune the pattern search space, and thus to reduce the time spent in exploring subspaces containing no frequent patterns. The framework includes both an exact and an approximate mining algorithm.
Code
The code of this project is publicly available on GitHub.

STruD
Overview
STruD is a framework to perform the simplicial truss decomposition of simplicial complexes. A simplicial complex is a generalization of a graph: a collection of nary relationships (instead of binary as the edges of a graph), named simplices.
STruD generalizes the graph notion of truss decomposition to complexes and includes algorithms to find (i) the simplicial truss decomposition of a simplicial complex, (ii) the topn simplices with maximum trussness, (iii) the ktruss of simplices of size greater or equal to a given q, and (iv) the standard truss decomposition of the 1skeleton of a simplicial complex.
Code
The code of this project is publicly available on GitHub.

ReSuM
Overview
ReSuM is a framework to mine relevant patterns from largeweight and multiweighted graphs. Assuming that the importance of a pattern is determined not only by its frequency in the graph, but also by the edge weights of its appearances, we propose four scoring functions to compute the relevance of the patterns. These functions satisfy the apriori property, and thus can rely on efficient pruning strategies.
The framework includes an exact and approximate mining algorithm. The first is characterized by intelligent storage and computation of the pattern scores, while the second is based on the aggregation of similar weighting functions to allow scalability and avoid redundant computations.
Code
The code of this project is publicly available on GitHub, while the datasets used in the experiments may be provided upon request.

ExCoDE
Overview
ExCoDE is a general framework to mine diverse dense correlated subgraphs from dynamic networks. The correlation of a subgraph is computed in terms of the minimum pairwise Pearson correlation between its edges. The density of a subgraph is computed either as the minimum average degree among the snapshots of the networks where the subgraph is active, or as the average average degree among the snapshots of the networks where the subgraph is active. The similarity between different subgraphs is measured as the Jaccard similarity between the corresponding sets of edges.
Code
The demo of this project is available on GitHub, together with several datasets. Additional information may be provided upon request.