Clustering is a fundamental machine learning task of dividing data into groups with similar properties and without known class labels in a training dataset. Clustering is often performed in the exploratory data analysis phase to get a better intuition about the structure of the dataset, or as a preliminary step for more complicated models.
The goal was to recognize and match heterogeneous data from different sources in different formats
We've introduced a two-step parallelized algorithm which performed fast clusterization of given data with very high confidence score. Overall presented algorithm was able to speed up data processing by a factor of 10.
A high parallelized complex algorithm was developed with embedded RNN, CNN and DNN architectures for different types of media. Various metrixs were defined based on DTW path, Euclidian and cosine distances. Bloom filters were applied to get final results.
PCA
K-means
Decision trees
Linear models
PageRank
Digital filters
DTW
Deep learning
Probabilistic graphical models
CART
ensembles
unsupervised sound segmentation
recurrent models
bayesian approach
probabilistic programming
hmm
alexnet
vgg
vae
PCA
TF-IDF
LDA
SVM
Naive bayes
word2vec
attention models
Hi, we are Sciforce - a company where the integration of various branches of science builds up a powerful force to create robust software solutions. Working at the intersection of Computer Science with other technical, natural and humanitarian sciences let us go beyond traditional IT services and become both technical and scientific forces to our customers.