Yesterday I really wanted to do something with Python. Also a friend sent me a link, which contained many currently free computer science books from Springer. Now I didn’t want to downloading each book separately.
So I thought to myself, how hard can it be to download the books automatically. And lo and behold, it wasn’t hard either. Enclosed, I would like to describe my learnings during the exercise and how it works that you can automate even simple web tasks.
The result then looks like this:
Now we can search and select the individual elements. The easiest way is to use XPath. The easiest way is to copy it from Chrome:
Then you can select the element in the code and output the text:
The result is as follows:
So, now we can load a page and select individual elements. So we can easily read all elements on the page and remember the links. Then we browse through the pages and remember all links and titles. In the following we call up all pages individually and click the download button. It is important that we explain to Chrome that the elements are not only opened but also downloaded. By default Chrome opens the PDF’s in its own viewer.
This makes the code look like this:
The result is that all the books here are loaded into the same local directory you are in. In general, it is always important not to make any mischief and especially not to overdo it. Here you can download all the books that are currently free. There are about 20 of them. I would have been faster to download the books by hand. I just wanted to practice a little bit with Python. So don’t be stupid and download too much I mean in the end you have to read the books too 😀
And yes, I intentionally used images for the code, so you cannot simply copy and paste the code to crawl the site. You should learn to do it and here you can read the code. So I hope next time, you will automate your task to save you time 😀
Thanks a lot for reading. Maybe I could inspire you a little bit to automate your own little tasks you have or to learn python.