Sunday, 31 July 2016

Apache Hadoop : Weather Analysis using HBASE, MapReduce and HDFS



Weather Analysis using HBASE, MapReduce and HDFS example:


The project is to download weather history data for most of the countries in the world and put data to HDFS. After data is put in HDFS, mapper and reducer jobs run against it and saved the analysis results to HBase. The code is developed and executed on Hadoop using Java and Hbase as the NoSQL database.

Here are steps to run through the application

1. Run the shell scripting and python code to parse the webpage to get all country codes, and use country code to download xml files for all countries
All the XML files are saved as xml_files/weather_xxx.xml (xxx is the country code)
2. Copy the xml files to HDFS
hadoop fs -mkdir /user
hadoop fs -mkdir /user/hadoop
hadoop fs -mkdir /user/hadoop/data
hadoop fs -ls /user/hadoop/data
hadoop fs -copyFromLocal /home/hadoop-weather-analysis/xml_files /user/hadoop/data/
3. Create weather tables in HBase database
create 'weather', 'mp'
create 'weather_sum', 'mp'
4. Load xml files from HDFS to weather table in HBase
hadoop jar loadXml2.jar com.hadoopgyaan.hbase.dataload.HBaseDriver /user/hadoop/data/xml_files /out1 weather
5. Check the data in the HBase table
count 'weather'
t = get_table 'weather'
t.scan
6. Process data to get the monthly data for past 10 years and save back to HBase table
hadoop jar processweather.jar com.hadoopgyaan.hbase.process.DBDriver
7. Check the results in the HBase table
scan 'weather_sum'

Downloads : 

Python,HBase and MapReduce Coding

I hope this tutorial will surely help you. If you have any questions or problems please let me know.

Happy Hadooping with Patrick..

Saturday, 30 July 2016

Python : Python Script that Parse "Jobs"


Python Script that Parse "Jobs"

This small python program scrapes data off of a 'Hiring Now' page on Hacker News or any other jobs websites and only saves the jobs with certain keywords. Ie 'New York', 'San Francisco' etc You can also use the keywords to find specific jobs ie 'Machine Learning'
You need to install Beautiful Soup 4 in order to use this program
$ pip install beautifulsoup4
tested with python2.7-32

Problems

  • It will grab jobs that mention any of the keywords in your list.
  • It will break if someone creates a link with 'More' as the text :(
Downloads


I hope this tutorial will surely help you. If you have any questions or problems please let me know.

Happy Hadooping with Patrick..