Competences

Machine learning: Employment of several core machine learning algorithms such as LogisticRegression, Decision Tree, Random Forest, Gradient boosting etc. In addition I have the knowledge to develop a Neural Network from scratch, implementing input layers, hidden layers, output layers, activation functions, optimizer algorithms, loss functions etc. With my Master's degree in Artificial Intelligence, I understand the mathematics behind the algorithms as well. During my research (Protecting the Great Barrier Reef) I gained experience in Convolutional Neural Networks (ConvNet/CNN), which is a Deep Learning algorithm that can take an image as input and e.g. has the ability to detect objects or patterns in an image.

Statistical Analysis: Experience with implementing and analyzing correlation matrices of data features, to summarize a large dataset and to identify and visualize patterns in the given data. In addition, I have experience in analyzing confusion matrices, loss curves etc.

Data Engineering: Acquisition of large data and Feature extraction (transforming arbitrary data, such as text or images, into numerical features usable for machine learning).

Information Retrieval: Is the process of extracting useful information from unstructured data that satisfies information needs from large collection of data. During my Master's Thesis I implemented several NLP techniques in order to filter out most important keywords specified in food product descriptions.

Optimization algorithms: Experience with several Hyperparameter Optimization algorithms such as Hyperband optimization, Bayesian optimization and Random search. This includes optimizing hyperparameters such as: amount of hidden layers, amount of hidden neurons, optmizer algorithm and learning rate. A neural network that is configured with the optimal hyperparameters, can be highly accurate. Besides I am familiar with Regularization techniques.

Data Mining: Employing previously mentioned techniques for identifying patterns in large datasets.

Fundamental Knowledge:
 •  Association
 •  Data Visualization
 •  Data Preparaiton / Data cleaning
 •  Classification
 •  Clustering
 •  Neural Networks
 •  Statistical techniques
 •  Tracking patterns

Tooling:
 •  DVC
 •  DBT
 •  Data validatie - Great expectations
 •  Docker (Extra: Docker compose)
 •  MLFLOW / Azure ML studio
 •  Minikube (Kubernetes)
 •  Kubeflow
 •  Gradio model deployment
 •  Grafana
 •  Git Actions
 •  PostgreSQL
 •  MySQL
 •  SQLAlchemy
 •  Snowflake
 •  MongoDB
 •  Grafana
 •  Streamlit
 •  Azure Cosmos DB, container registry and instances
 •  Azure Queue's
 •  FastAPI
 •  Grafana
 •  RAY Hyperparameter tuning

Work

 •  Kaggle competition 2022
 •  Master's thesis at Ahold Delhaize - Albert Heijn
 •  Aurai - Project Manufy

TensorFlow - Help Protect the Great Barrier Reef

The Great Barrier Reef in Australia is the world's largest coral reef and home to 1500 species of fish, 400 species of corals and a lot more variety of other sea life. However, currently the reef is under threat, in part because of the overpopulation of one particular starfish. The coral-eating crown-of-thorns starfish is the type of starfish that eats the coral. The Crown-of-thorns starfish (COTS) is one of the few main factors that is responsible for coral loss in the great barrier reef. A few COTS on a reef might be beneficial for the biological diversity, as they keep down the growth of fast-growing coral species and thus will keep space for other coral species. Nevertheless, currently the population of COTS is growing in such a fast pace, that there is more coral being eating than it has a chance to grow back. Their population can break out to thousands of starfish on individual reefs, and eating up most of the coral and leaving behind a white-scarred reef that will take years to recover. With the use of an object detection algorithm it is possible to detect them, since the COTS are among some of the largest starfish species. On average 25-35cm, but they can grow up to a size of 80cm in diameter. With this size, it makes them easier to spot on a reef. So one way of improving the efficiency and scale at which marine scientists survey for COTS, is to implement AI-driven environmental surveys.
Two type of Convolutional Neural Networks has been implemented during this research: Faster R-CNN and YOLOv5. Furthermore, several Ensemble methods has been explored such as NMS(Non-maximum Suppression), Soft-NMS and WBF(Weighted Boxes Fusion).
The full research paper includes: data analysis, data preparation, methods and results.
The research paper is available at Researchgate: Research paper.

Master Artificial Intelligence Thesis - Ahold Delhaize

Will be available soon!

Aurai - Project Manufy

Manufy offers a service(platform) where sellers of products can interact with manufacturers, using a chat service to communicate. However, some of the chat messages violate Manufy’s policies and are illegal. Therefore, the messages needs to be evaluated automatically as they are sent. With the use of NLP techniques and other feature engineering tasks, we managed to develop a model that labels incoming messages. MLFlow was used for tracking several experiments with their belonging metrics, parameters and artifacts (Model registry). Besides the feature engineering and developing the model, I was fully responsible of deploying the model as an endpoint in Azure Machine Learning studio (blue-green deployment). I used Docker, in order to develop and deploy the complete pipeline. The pipeline in Azure (that exists off: container instances, Azure Queues and Azure functions) is able to perform inferencing with the use of the existing endpoint in Azure ML studio. Furthermore, I was responsible for cloud engineering tasks like: setting-up and managing Cosmos databases in Azure. During this project Git was used to develop together with the Data engineers.

Certifications

Snowflake - Data Warehousing

 •  Account Editions, Regions & Clouds
 •  Snowflake Identity, Access, Users, & Roles
 •  Databases, Ownership and Context
 •  Creating and managing Worksheets and Warehouses
 •  External Named Stages (AWS S3 bucket)
 •  Loading Tables Using SQL Insert Statements
 •  Creating File Formats, Stages, and Copy Into Loading
 •  Semi-Structured Data including XML and JSON
 •  Querying Nested Semi-Structured Data

Snowflake - Data Applications

 •  App builder tools like Streamlit, Python, GitHub, and Web Services Rest APIs
 •  Internal Named Stages
 •  Working with SnowSQL (Snowflake's CLI tool), performing file PUTs

About

I (Marc Blomvliet) obtained a Master's degree in Artificial Intelligence at Vrije Universiteit Amsterdam. My Master's thesis focused on improving data quality control of food products, with the use of machine learning techniques. This Master's thesis was conducted at Ahold Delhaize - Albert Heijn. I obtained a Bachelor's degree in Software engineering/Electrical engineering. My Bachelor's thesis focused on Image Recognition which is a tract of Artificial Intelligence.

I am passionate about machine learning, statistics, data mining, programming, and many further data science related fields.

Projects

Some of the recent projects I've been working on are shown below. As with the majority of the projects, I've been employed by Aurai as an Interim Machine Learning Engineer or Data Engineer.

Southfields

At Southfields they brainstormed about how they can generate automatic match summaries from their data...

Read more

Gemeente Stichtse Vecht

Coming soon...

Read more

Gemeente Lochem

The municipality of Lochem had no insight into the size, composition and utilization of the housing stock. I was fully responsible for data extraction, transformation, ingestion and developing dashboards to guarantee client satisfaction...

Read more

Manufy

Manufy offers a service(platform) where sellers of products can interact with manufacturers, using a chat service to communicate...

Read more

Ahold Delhaize - Albert Heijn

Coming soon...

Read more

Southfields


Southfields

At Southfields they brainstormed about how they can generate automatic match summaries from their data. This question mainly consisted out of the fact that, among other things, the editorial office spends a lot of time on creating current content and spends a lot of time on writing and researching the statistics of match events. For this, I used a Large Language model from OpenAI GPT-3.5 where I finetuned the model with by means of match summaries and match data. To make sure that the language of the input data was understandable, I built a data preparation pipeline in Python: Different data feeds are obtained via an API where Southfields purchased their data from. Data is being prepared and transferred to a S3 bucket in AWS. An application (Front-end) is built with Streamlit in combination with HTML. This application runs on an EC2 instance (in AWS). In addition, Route53 AWS was used to direct the compute to an own domain name. To automatically deploy on the EC2 instance, I had the responsibility to create a AWS role with diverse policies to give the needed rights to the EC2 instance and to built in CI/CD. To guarantee complete security, I used AWS Secrets Manager for managing API keys, passwords and usernames. To enhance productivity as a developer, I implemented CI/CD by means of Github Actions, AWS Codepipeline and AWS CodeDeploy. Another advantage is that, failures are detected faster and as such can be repaired faster, which leads to increasing release rates. Furthermore, I was responsible for giving weekly updates to my team, consisting out of two senior front-end developers.

Gemeente Lochem


Gemeente Lochem

The municipality of Lochem had no insight into the size, composition and utilization of the housing stock. Due to a reporting obligation from the Program ’Housing construction’ for the period 2022-2030 it is imperative to find a solution for the above as soon as possible. In order to create an insightful report, they asked Marc to further develop the Datalakehouse and develop several dashboards to gain insight. I worked with data sources like BAG, BRK , BRP, WOZ & CBS. These are data sources that some Dutch authorities can access and manage. Like the municipality of Lochem for example. With the use of Azure Data Factory (ADF) the data is being ingested in the Datalakehouse (Databricks). I used DBT to make any necessary data transformations. These transformations are translated into views that can be used to create the dashboards in Databricks. Subsequently, these dashboards can be accessed by the stakeholders within the municipality of Lochem. These views were created with DBT in the Datalakehouse. All dashboards are managed in Databricks directly. For this use case, Databricks Dashboards meets all the requirements needed to achieve customer satisfaction. For this reason a more advanced visual analytics platform like Tableau was not needed.

Manufy


Manufy

Manufy offers a service(platform) where sellers of products can interact with manufacturers, using a chat service to communicate. However, some of the chat messages violate Manufy’s policies and are illegal. Therefore, the messages needs to be evaluated automatically as they are sent. With the use of NLP techniques and other feature engineering tasks, we managed to develop a model that labels incoming messages. MLFlow was used for tracking several experiments with their belonging metrics, parameters and artifacts (Model registry). Besides the feature engineering and developing the model, I was fully responsible of deploying the model as an endpoint in Azure Machine Learning studio (blue-green deployment). I used Docker, in order to develop and deploy the complete pipeline. The pipeline in Azure (that exists off: container instances, Azure Queues and Azure functions) is able to perform inferencing with the use of the existing endpoint in Azure ML studio. Furthermore, I was responsible for cloud engineering tasks like: setting-up and managing Cosmos databases in Azure. During this project Git was used to develop together with the Data engineers.