And also creating a robust pipeline to move data from AWS S3 into Azure File Share by using Azure Data Factory


There has always been a problem in the field of machine learning when we have multiple VM’s for training purposes and to train we have to download all the files in each VM. This would take up a lot of space in VM where we have to attach large hard drives for the same datasets that reside in it. Azure File Share overcomes this problem by sharing the storage drive across multiple VM’s using industry-standard SMB protocol. I will…

Making Sense of Big Data

Inspired by the Netflix Simian Army to construct a fully well-managed, fault-tolerant system using AWS Auto Scaling Groups and CI/CD utilizing CodeDeploy and CodePipeline wielding GitHub as the source control.

Every architect has this dream of making a sophisticated system for their development environment. Making a Fault Tolerance system ensures high availability and redundancy to the system from a single point of failure. Nowadays many tools help to do this automatically like AWS Auto Scaling Groups and we need to employ it. I read this piece from Netflix Technology Blog and got motivated to write this blog. My blog includes a comprehensive end to end system starting from GitHub and ending on your browser. …

Data is the new oil in the digital economy and there is more and more need of the data engineers than ever. Data engineers are responsible for provisioning and setting up big data platforms in cloud or off premise servers which also includes setting up AWS big data tools like Glue and Athena.

Let’s start with the formal definition of these services starting from S3. Amazon Simple Storage Service (Amazon S3) is an object storage service that stores and protects any amount of data for a range of use cases, such as data lakes, websites, archive, enterprise application, IoT devices…

Making Life easy through the virtual assistant.

Have you ever gone to bed and forgot how much was the billing for this month. Just to check it you need to get up from your bed, open you laptop, log in with your credentials, open browser, go to AWS console and check the billing dashboard. Fear not, AWS Lambda combining with Siri will help you get your billing reports in your fingertips or even more, through your voice.

So what is AWS Lambda? AWS Lambda are the code/functions running on a serverless service provided by AWS. We don’t have to provision…

and no, we are not talking about the Azure Monitor with predefined metrics.


We all have seen the Monitoring Tab Key Metrics on the VM Page. Yes, it is useful to see whether the CPU has been running or not through the CPU metrics, to check when the VM is getting the data from the outside world through Network In metrics and if the VM is doing any kind of write operation using Disk Operations/Sec metrics but it’s not effective for the custom services that we build on VM. …

To focus more on making quality docker images and save your valuable primary disk space from those images and containers.


The picture pretty much sums up when you are creating your Docker Image from Dockerfile. We run into making so much of the Images and Containers for our Machine learning models and the next thing you know is that you are running out of disk space in your VM. Granted you can change the disk space on the AWS EC2, the same thing cannot be said on the Azure VM’s as they come with the fixed 32GB size and we…

How to deploy everything with the click of a button or with a single line of command


We have all faced this. While we are testing a new service or a new library, we need to make a fresh new VM that would eat our precious minutes every day. Isn’t there something that could do this automatically? Well, fear not, Azure Templates is here to solve this problem. In this blog, I would be writing on how to create templates for your resource so that you would deploy them with some clicks in the future.


  1. An Azure account with Azure…

Complete study plan for passing the DP-200 and DP-201 examination

I have recently completed the Microsoft Azure Data Engineer Certification exam. I have attached a link to acclaim to view my certification below.



To become a Microsoft Certified Azure Data Engineer you need to pass two exams:

  1. Implementing an Azure Data Solution: DP-200
  2. Designing an Azure Data Solution: DP-201

Implementing an Azure Data Solution basically deals with the implementation of the core Microsoft Data related services like Azure Cosmos DB, Azure Synapse Analytics, Azure Data Factory, Azure Stream Analytics, Azure Databricks, Azure Blob Storage, and Azure Data Lake Storage Gen…

Integrating all your database server that exists in both cloud and localhost with the database tools & IDE that you love.

This is a straightforward and comprehensive blog for programmers and developers on how to connect your local and cloud database servers with your development tools and IDE’s that helps to accelerate the rate of the workflow. With this setup, especially with the IDE, you can focus more on the code and also helps to make debugging easier while working on the database logic. …

so that you can code from your host machine on the instance with vscode facilities.

Well, its not like that shown in the picture but you get the point. We are connecting amazon ec2 instance with visual studio code but from this post you can connect any cloud instance (GCP or Azure) and code with just a little bit of tweaking. The motivation to do this was to write data-serialization language from GUI into the instance(due to those YAML indentation). I do not suggest writing code directly as git is there already to ease the process.


  • Amazon EC2 Instance on running state
  • Visual Studio Code Installed on the host machine

Amazon EC2:

From ec2, the…

Sulabh Shrestha

Data Engineer at MotionsCloud | Microsoft Certified: Azure Data Engineer 👦 🏢

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store