Clustering is a fundamental machine learning task of dividing data into groups with similar properties and without known class labels in a training dataset. Clustering is often performed in the exploratory data analysis phase to get a better intuition about the structure of the dataset, or as a preliminary step for more complicated models.
The goal was to recognize and match heterogeneous data from different sources in different formats
We've introduced a two-step parallelized algorithm which performed fast clusterization of given data with very high confidence score. Overall presented algorithm was able to speed up data processing by a factor of 10.
A high parallelized complex algorithm was developed with embedded RNN, CNN and DNN architectures for different types of media. Various metrixs were defined based on DTW path, Euclidian and cosine distances. Bloom filters were applied to get final results.
Probabilistic graphical models
unsupervised sound segmentation
Hi, we are Sciforce - a company where the integration of various branches of science builds up a powerful force to create robust software solutions. Working at the intersection of Computer Science with other technical, natural and humanitarian sciences let us go beyond traditional IT services and become both technical and scientific forces to our customers.