Why a serverless python layer generator?
If you have ever worked with Lambda Layers, you know they are central in making your code well organized. They are a convenient and effective way to share code across lambda functions and help reduce the size of uploaded archives.
The way you do it on a local machine can vary depending on the OS and the language used. If you try to do it by copying a python virtual environment you will need a different script for Windows or for Linux. The best way is to do it in full python to make it OS agnostic. For development purposes or infrastructure as a code situations (IaaC), an even better way is to make it serverless.
At the end of the article you will be able to create a layer through AWS CLI or a lamdba call without the need of having the code on your local machine. If you want to go further the lambda can be set behind API Gateway, to share with your team or child AWS accounts.
And yes also, it’s lightning fast :)
You can find the repository here.
The code is pretty straight forward. It can be used from a local machine without too many changes and be OS agnostic. I used Terraform to deploy the lambda easily to the cloud.
For python packages using underlying C++ the lambda need a custom GCC layer. Because it’s not the same for python3.8 and python3.7 or lower you can switch to python3.7- branch for theses runtimes.
Everything you need for the deployment is explained in the readme.md
Once deployed the lambda shows up in your AWS account.
I will use default settings and my personal not public bucket vv-scraping-utils-eu with selenium and scrapy as packages to download. Scrapy have underlying c++ packages so we can test the GCC version is working well.
And it’s good! It took 20s to download the package, zip it, upload it on AWS S3 and make the layer!
If you find it slow, put more than 512Mb. I like to put at least 1024Mb… or the maximum because it’s faster and anyway it’s in the free tiers :)
Thank you for reading my first article!
This article is the very first one of a long project. In every datascience project first we Extract, then we Transform and finally we Load ! And only after everything is well packed in a Data Warehouse we can then do some analysis or Datascience.
We are just at the beginning and as you can guess, we are going to do some serverless web scraping. We will use AWS Lamdba and dynamoDB.
This is gonna be a long way, if you don’t want to miss anything about this story please follow me on Medium to get updates and connect with me on Linkedin.
Also star this repository where you will find all my cloud and data articles.