The Interesting Deaths of African Susies

Before livestock is killed in certain parts of the world, they make them listen to music. Like, someone is actually employed to suit-up in a three-piece and play, for example, classical music for…


独家优惠奖金 100% 高达 1 BTC + 180 免费旋转

7 Things We Wished We Knew Before Starting our Data Platform AWS Migration

By Dhanush Soundarapandyan, Rene Haase & Yuva Mahendran

Understand the Goal

The main goal of any Cloud migration is to move the data processing and storage to the Cloud. But, the most important step is deciding what a successful migration looks like. Do you want to mainly lift and shift — minimal coding and fast migration? Or, will you use this project as an opportunity to reduce tech debt? If the latter — decide how much tech debt you would like to eliminate and figure out if that means a reengineering of the to-be-migrated process.

These decisions early on enable your teams to make the right decisions later, when considering if additional work is necessary. In our case, it was a hybrid approach where we quickly moved the whole platform onto the Cloud while eliminating major pain points (ie: outdated software versions, limited monitoring), and creating a more stable log consolidation process.

Plan out Capacity

Doing the correct calculations up-front allows you to work faster as you implement. Our subnets were too small for the number and size of Elastic Map Reduce (EMR) clusters we planned to run, and we didn’t realize that until it was too late. As a result, the cluster builds failed when there weren’t enough IP addresses available to assign — one address per cluster node. While we were able to fix this issue quickly, it added an unnecessary delay to our migration timeline.

Pick the Right Partner

While Amazon brought a wealth of technical expertise about AWS and supported the development of solutions, they didn’t deploy their solutions to the production environment. Additionally, AWS Technical Services gravitated to AWS native solutions which we considered a suboptimal solution in certain scenarios.

Here is where partnering with GetInData came in. They solved our need to increase team output with their end-to-end software consulting solutions. Their developers provided coaching, valuable input during design discussions, and high velocity software development.

Beware of S3 Eventual Consistency

We knew early on that managing data consistency was an issue during the migration, but we underestimated its impact. While S3 offers high availability, it does not offer a Service Level agreement(SLA) when data changes are accessible through the S3 APIs. This has two negative implications when building a data pipeline requiring fast processing. First, when a file is written to S3, there is a lag until the file becomes available in S3 list operations. Secondly, when a file is overwritten, file extractions too soon after migration might return the old version of the file.

Consider the Lack of S3 Cross Region Replication SLA

IAS’s end points are deployed in multiple worldwide regions. The data gets collected S3 buckets in the respective location, and replicated into a central S3 bucket located in the US EAST-1 region.

Think about the right Orchestration Solution

We built a custom EMR launch operator for Airflow, providing us a simple and standard implementation of setting up and tearing down EMR clusters. In retrospect, spending time evaluating different orchestration solutions paid off as orchestration tasks have been surprisingly easy so far.

Automate Testing

Like many migration projects, the length of the project is determined by the length of your testing phase. It was critical that testing and validation processes are automated because data elements can become complex, and simple text comparison doesn’t always create the desired results. We created two tools to automate testing:

Moving to the cloud will provide your company with plenty of benefits. We decided to leverage AWS and that choice has not disappointed so far. The effort of migrating to the cloud can easily balloon if the project is not carefully planned out up-front. So please do your research up front. We hope that our tips will make your cloud migration a little bit easier.

Update: At ReInvent in Dec 2020 AWS announced improvements to S3 eventual consistency which no longer requires the need for EMR FS for time critical data pipelines.

Add a comment

Related posts:

How to Use a Library in Next.JS That Wants Window.Whatever

How to use a library that expects to be running in a browser, but is breaking when rendered server-side in the node.js environment, or, how to make react-chat-widget work in next.js. This article…

Your Potential is Endless

I think every single self-help concept in existence makes this particular statement. All the new-age teachers and concepts say it. However — this does not make it any less true. It applies to…

The metaverse explained

Metaverse became a hot topic when Facebook changed its company name to “Meta” in 2021. More than a marketing strategy, Facebook Founder Mark Zuckerberg showed the world how serious the company is in…