Fake News spread on Twitter

Hackathon-awarded platform for visualizing spread of fake news on Twitter.

Transformer-based topic detection

Master thesis on how a embeddings from LLMs can be used to solve topic detection.

Data Viz - Genetic Algorithms

Small website visualizing inner work of a genetic algorithm in a interactive visualization.

AI Songwriter for Business

LLM based app writing soundtrack for any business.

Visualize the ROI of balcony solar plant

Streamlit App that calculates potential ROI for a Berlin-based subsidy for balcony solar panels.

AI for Ransomware detection

Research for IBM on method for semi-supervised method for detection of ransomware infection in backup data.

GPT-3 fixing gender bias

Medium post on how companies can use AI to improve job descriptions.

Voice-enabled data viz

Bachelor Theis project; Voice-enabled d3.js data visualization controlled by Amazon Alexa.

About +

Software engineer by training and AI engineer by heart.

I'm a data-driven problem solver with a passion for building intelligent systems that (sometimes) make the world a better place. With a solid background in machine learning, data analysis, and software engineering, I specialize in developing data-intensive applications that deliver measurable results.

As a data engineer, I can build robust software to mine, cleanse, wrangle, integrate, and store data in a database. I have also experience in business intelligence, analytics, and data visualization.

As a data scientist, I've applied my skills to a wide range of projects, from natural language processing to computer vision. I'm always eager to take on new challenges and explore cutting-edge technologies that can drive innovation and growth.

Skills +

Python

JavaScript / Node

Java/Go

AI Engineering

Data Science

Data Engineering

MLOps

DevOps

Data Visualization

AWS

GCP

Vercel

Certificates

Google Cloud Professional ML Engineer

Google Cloud Professional Data Engineer

Udacity Deep Learning Nanodegree

Work Experience +

Doit International

Software Engineer – Data & AI

Remote | 08/2021 - Present
  • Engineered and maintained business-critical ETL pipelines (SQL, Dataform) processing +1TB/day of cloud billing data powering our flagship SaaS - Flexsave.
  • Championed a cross-team project to build an internal FinOps platform for asset management and analytics; lowering daily operational overhead 10x, improved the profitability of Flexsave by ~25%.
  • Built and supported ML pipelines (Kubeflow, ARIMA, and XGBoost) forecasting cloud spend for +2000 customers.
  • Designed an automated decision-making system leveraging ML predictions which saved ~1 FTE of work and drastically reduced inefficiencies, thus increasing the profit margins by 20%.
  • Led and mentored a team of 5 engineers to develop and deploy a right-sizing tool in just 4 weeks, resulting in an estimated cost savings of $300k per year and a promotion.

Axel Springer NMT

Software & Data Engineer

Berlin, Germany | 08/2020 - 08/2021
  • Engineered a transformer-based (SentenceBERT) topic modeling pipeline (PyTorch, UMAP, HDBSCAN); which saved countless hours of manual work by automating a process of labeling +3k news articles per month.
  • Built +10 Slack bots monitoring the front page of BILD.de and delivering insights and alerts to the editorial team; saved 5h per week of manual checks and cut 90% of the reaction time in case of anomalies.
  • Led a cross-team project to develop a ML model to accurately predict click-through rates of news articles on BILD.de, resulting in a 15% increase in overall engagement metrics.

IBM Research

Data Science Researcher

San Jose, California | 03/2019 - 04/2020
  • Engineered a cybersecurity anomaly detection ML pipeline for IBM's Recovery Orchestration using Random Forest and DNN; detecting +95% of intrusions when exposed to unseen ransomware strains.
  • Implemented secure ETL pipeline using malware Sandbox Cuckoo, VMware, and Go-lang. It fully automated the process of obtaining training data by infecting various VMs with over 200+ ransomware strains for analysis.
  • Collaborated with cross-functional teams to integrate the ML system into IBM's Recovery Orchestration product.

Digitas Pixelpark

AI / Data Science Intern

Berlin, Germany | 08/2016 - 02/2019
  • Built and monitored over +20 ETL pipelines in Apache Airflow and Hadoop Map-Reduce processing millions of records every day for multiple dashboards; consistently excelling all required Data Quality metrics and SLAs.
  • Designed a PoC of dashboarding CMS using JavaScript & Python (D3.js, Flask, PySpark, React) which became a default communication tool with multiple clients.
  • Architected and implemented 10+ Tableau dashboards for Data-Driven marketing; used daily by marketing managers from companies like i.e., Mercedes-Benz, McDonald, or BMW to monitor digital marketing KPIs.

Telekom Innovation Lab

Student Researcher

Berlin, Germany | 02/2016 - 08/2016
  • Took active role in Full-stack NodeJS app development
  • Built a PoC of a messaging app on Android leveraging Bluetooth mesh networking
  • Researched and summarized countless papers in the field of Bluetooth-powered mesh networks

Education+

Technical University Berlin

Master's - Computer Science & Media

GPA 3.7 | 2018 - 2020
  • Coursework: Machine Learning, Deep Learning, Natural Language Processing, Computer Vision, Data Mining, Big Data, Cloud Computing, Software Engineering
  • Thesis: Design and evaluation of embedding-based topic modeling system for news articles

Free University Berlin

Bachelor's Degree - Computer Science & Media

2014 - 2017
  • Coursework: Algorithms and Data Structures, Web technologies, Linear Algebra, Advanced Multivariate Calculus, Statistics, Media Science
  • Thesis: "Design, Development, and Evaluation of a Voice User Interface for Queries on semistructured Data"