How to find toilet papers using headless browsers?!


Almost everybody is directly or indirectly affected by Coronavirus. We have had to change our habits in a lot of different ways. One of those, is the way we shop for toilet papers! Anxiety of not finding the essentials we need is now part of our lives.

But do Not worry, headless browsers are here to rescue!

So let's find out how we can utilize 0Browser, a headless browser as a service, to automate the task of finding toilet papers on the internet!

We all have our favorite app that we shop on or a favorite search engine that we always search on. No matter what our favorites are, in situations like this we want to quickly browse through as many sources as we can, to get to the gold! Once we find it, we want to make sure it's available and it's not gonna take too long for the source to deliver the toilet papers and of course last but not least is the price. Now that we know our criteria, let's get started and find our perfect toilet paper!

Usually the simplest way to think of this system is to mimic what a person would do when searching for toilet papers.

How do we search on Google?

We usually start by searching for a keyword. Lets type "toilet paper" and hit enter.

google search bar

Alright! In the screenshot below we can see some results right on top of the page but they are not really what we are looking for.

google search results

Lets navigate to the Shopping tab where things get a little more interesting. As see in the screen shot below, on the left hand side (green dotted section) we have a set of filters to choose from. We also see an interesting set of results to the right (blue dotted section).

google shopping tab

Let's examine each section in more details.

1. Search Filters

Search filters on Google are context aware. They change depending on which product we are looking for. In the case of toilet papers, we can filter by price, brand, count, ply, condition, shipping and seller. In our design we can make some of these dynamic but for the sake of simplicity, lets go with the default selections.

2. Search Results

Google encourages all websites to follow Semantic Web Design standards because it makes Web Crawler's task of looking for all site details a lot easier. Shockingly, Google doesn't follow its own advice! So we are out of luck to find any semantic web components on Google's search result set.
Also, search results are divided into sponsored links and real web search results. Let's focus on the main search results here.

3. Navigation

Let's identify different ways that a user can navigate from one page to another.
1. Users can paginate to move from one page to another.

google pagination

2. Users can click on a search result, to be taken to another website to find more info on the item.

google shopping item

google shopping item with details

As you can see in the screen shots above, initial product click only expands the card component to reveal more info. It also activates product's external link. Next time we click on the product's title we will be taken to an external website.

3. Users can filter results either through the filter pan on the left as shown previously or using the visual filter components Google provides at the bottom of the shopping search result as displayed in the screen shot below.

google search filters

4. final way for a user to navigate and process page data is to use the sort drop down to prioritize the item display order as shown here.

google sort

Let's Get Down & Dirty!

To put all the stuff we talked about in a real practical context, we can check out a demo of how 0Browser finds toilet paper on demand here on github.

Feel free to use any programming language you want for this but we are using Java-Script and Node for this.

In this code, we set out to get all "toilet papers" that are in new condition and get shipped for free. We sort the results by review score. Then parse the result page to fetch all the toilet papers found and save the title, picture, url to purchase and prices where we see fit either in memory, text file, SQL, NoSQL DB, etc. Then we will navigate through 5 more pages and fetch all the result details to save.