Recipe Extractor
- 3 Devlogs
- 4 Total hours
takes any recipe from a website and removes the boring story that always comes at the start of it because no one wants to know why these cookies are your entire life story.
takes any recipe from a website and removes the boring story that always comes at the start of it because no one wants to know why these cookies are your entire life story.
Scraping & Parsing logic fully complete!
After a lot of work, I converted the code from printing the recipe onto the console to returning it as a JsonObject. This will ideally allow me to use an api to fetch it from a frontend website.
I tested the program with various recipe websites and they all seem to work! Originally, I was having trouble with website security features but through the use of selenium I was able to get past to the website.
For the next devlog I hope to be done API integration, and some basic front end testing.
Ingredients can now print on the console
I have just finished a prototype version where, when given a recipe link, the ingredients of the recipe are printed. This took a lot of time, and I had to change what package i used to make it work. The trick to make this work really boiled down into the fact that most recipe websites store their data in json. As a result, this is really easy to scrape and extract. Here is an example of the ingredients of some chocolate brownies printing in the console.
Website Scraper
Through the use of the JSoup java library I tried implementing a website scraper, witch takes all of the elements on a website and displays it as plain text on the console. This method worked for less secure sites such as wikipedia, but on most recipe sites that may use cloudflare it did not work. This is likely due to the anitscraping measures cloudflare has set. To work around this I may play around with other java libraries that usually have a higher success rate.
Also, I don’t know why my coding time saved under the ‘user’ project in hackatime rather than a frictionless project. If anyone can help with that, that would be lovely.