HowTo: Create web-scraper with AWS Lambda
In my last article I introduced AWS Lambda and how to create your first Lambda function. Now I want to develop a more useful function in this article.
Imagine you want to watch a product price trend at Amazon. A service that gives you the current price of the product as JSON would be handy. So we will develop a function in this article that takes the URL to the Amazon article and returns the title and price of the product as JSON.
Loading the product page
First you need the product page of the article. You can load it with axios.
After that you can
require and use
In the code above, you load the Amazon page of the
Raspberry Pi 4 synchronously and pass the code on. The return code will be the same of the axios loading return code.
Extracting the price and title from the page
To extract the title and price, you can go through
CSS selectors. In the example I use
CSS selectors, because it is easier here, because the elements we are looking for are already marked with unique IDs.
I chose the price here in Safari with the developer-tools:
In the console of your browser you can easily test if the
CSS selector is correct. Shown here using the Raspberry Pi as an example:
Create and test the function
Now we have found the tags to which we need to send the request and can customize the function to return the title and price.
To apply the CSS selectors to the document, you need to parse it as a document. The library
jsdom is suitable for this. You install it with
npm install --save jsdom.
Now you can load the Amazon URL with
Axios, parse the document and select the elements with the
CSS selectors. With
.textContent you can mature the text content of the elements. You still have to clean it up and adjust it. Then you can assemble the response and test the function locally.
The code is here:
You can test the function locally with
serverless invoke local --function price. The answer should look like this:
You can make the whole thing generic and simply enter the URL by parameter. After that the function looks like this:
The URL is generic and you can pass it along. The test must now look like this:
The response should look exactly the same. Now the function is ready to be deployed.
Upload and online test of the function
You can deploy the function with
serverless deploy. The upload should take about one minute. Then you can use the URL that is displayed to you.
Creating an AWS Lambda function is easy, as is deployment and local testing. So you can build simple web services that you can use quickly. The example shown here allows you to develop a service with which you can monitor the Amazon price. I hope you found such a simple real use case for AWS Lambda and can now develop your own projects. If you liked the contribution, please leave me a big applause or write a comment.
Originally published at http://github.com.