AWS Glue is a serverless ETL service offering that has a pre-built Apache Spark environment for distributed data processing. It makes developers life easy; simply write code and execute while AWS Glue take care of managing infrastructure, job execution, bookmarking & monitoring. That being said, AWS Glue is not just a managed Spark cluster, it has a component library for most common ETL tasks. But the challenge is how to leverage such capabilities during our local development. Fortunately, it is possible and I am happy to share my learning.
[Read More]