Scraping the web - part 1

One of the projects that I have to do for school is to create a Ruby CLI Gem. For someone who is relatively new to Ruby, its kind of scary. I still feel like I'm still learning Ruby, so trying to put something together from start to finish is a little intimidating. I will finish though. My idea came from my frugalness (is that even a word?). I love to save money. So, my project is going to scrape a popular couponing website.

What is scraping? Well, Ruby has gems called Nokogiri and Open URI that allow you to get information off of websites to use in a program. This is one of the ways that developers are able to build apps that have information from a verified source without being a part of that source. Like a (for lack of a better term) knock-off MLB site that gets their stats from the MLB website. The programmer just has to get the correct html/css information from the nodes that nokogiri generates and then you can use that information to build your own site/program/app with the same information.

Now that you know what scraping is...

Here is a basic idea of how the gem is built.

Setup the gem
spec out a general way of how the gem will run, what it will prompt the user, using hard coded data.
Get the scraped data using a combination of Nokogiri and Pry to check your output.
Create Classes to hold the data.
Determine the best way to organize your data, since we aren't using a database, choices are Variables, Arrays or Hashes.
Figure out what information the user needs and what the user may not be able to use to achieve your goal. What you decide to keep may be different than what you thought you needed when you started.
Update the command line interface so that your scraped data now works
run your gem over and over to make sure that you haven't missed any little quirks. Things like using regex to get consistent data between multiple pages/items being scraped.
Refactor code

Soon, I will be submitting this project and you will get to see how it works.

Comments

Popular Posts