Showing posts with label Python. Show all posts

Monday, 29 August 2016

Credit Card Fraud Detection Python Project

This Solution is built using SQlite3 which is integrated in Python. The information received from the bank, has been split into different tables of SQLite. The idea is to have the database updated in real time and extraction of any fraudulent data which is identified based on certain guidelines in this case which is assumed to be three of them.

The first anomaly is identified when there is any discrepancy for amount withdrawals less than a certain amount from the user’s account which is unlikely to be have taken place.
Second Anomaly is based on Frequency. If there are transaction on the same date multiple times.
Third Anomaly is based on the Preferred location of an individual user which is gathered as per the trend provided.

Check out for above Case study in below mentioned weblink.

Credit Card Fraud Detection Python Project

Tuesday, 23 August 2016

Walmart Recruiting Store Sales Forecasting Python Analysis

Retail data modeling challenge is limited historical data to make decisions required. If the annual Christmas approaching, so have the opportunity to look at the impact of strategic decisions on the bottom line. In this game the recruitment, job seekers can access the mall is located 45 walmart historical sales data for different regions, each mall contains a number of sectors.Participants must estimate sales for each department every mall. To increase the challenge, the holiday price cuts included in the data set, we all know these price cuts will affect sales, the problem is to predict which sectors are affected and affected degree.

Check out below link for above case Study.

Walmart Recruiting Store Sales Forecasting Python
Analysis

I hope this tutorial will surely help you. If you have any questions or problems please let me know.

Happy Hadooping with Patrick..

Saturday, 20 August 2016

2016 US Presidential Election Tweets Crawler Python Script

A streaming tweets crawler for collecting tweets related to 2016 US presidential election.

Check out above Python Script on below weblink.

2016 US Presidential Election Tweets Crawler

Happy Hadooping with Patrick..

Saturday, 30 July 2016

Python : Python Script that Parse "Jobs"

Python Script that Parse "Jobs"

This small python program scrapes data off of a 'Hiring Now' page on Hacker News or any other jobs websites and only saves the jobs with certain keywords. Ie 'New York', 'San Francisco' etc You can also use the keywords to find specific jobs ie 'Machine Learning'

You need to install Beautiful Soup 4 in order to use this program

$ pip install beautifulsoup4

tested with python2.7-32

Problems

It will grab jobs that mention any of the keywords in your list.
It will break if someone creates a link with 'More' as the text :(

Downloads

Python Jobs Parser Script

I hope this tutorial will surely help you. If you have any questions or problems please let me know.

Happy Hadooping with Patrick..

Wednesday, 13 July 2016

Python : Python Script that Parses PDF files

Python Script that Parses PDF files
PDF documents are beautiful things, but that beauty is often only skin deep. Inside, they might have any number of structures that are difficult to understand and exasperating to get at.That means that in the end, a beautiful PDF document is really meant to be read and its internals are not to be messed with.Below is the Python script through which you can get the contents of PDF.

DOWNLOADS:

Parsing PDF Python Script

I hope this tutorial will surely help you. If you have any questions or problems please let me know.

Happy Hadooping with Patrick..

Saturday, 25 June 2016

Python : Analysis of Brexit Data

Analysis of Brexit Data

British withdrawal from the European Union, often shortened to Brexit is a political goal that was pursued by various individuals, advocacy groups, and political parties since the United Kingdom (UK) joined the precursor of the European Union (EU) in 1973.

In [1]:

from __future__ import division
from bs4 import BeautifulSoup
import urllib2
import re
import string

In [2]:

# Scrape brexit data from BBC

areas = []
all_votes_leave = []
all_votes_remain = []
turnout_proportions = []

for letter in string.ascii_lowercase:
    try:
        response = urllib2.urlopen('http://www.bbc.co.uk/news/politics/eu_referendum/results/local/' + letter)
        html = response.read()
        soup = BeautifulSoup(html, "lxml")
    except:
        continue
    for results in soup.find_all(attrs={"class": "eu-ref-result-bar"}):
        area = results.find(attrs={"class": "eu-ref-result-bar__title"}).text
        tmp = results.find(attrs={"class": "eu-ref-result-bar__party eu-ref-result-bar__party--leave"})
        tmp = tmp.find(attrs={"class": "eu-ref-result-bar__votes"})
        tmp = re.sub('[a-zA-z, \n]', '', tmp.text)
        votes_leave = int(tmp)
        tmp = results.find(attrs={"class": "eu-ref-result-bar__party eu-ref-result-bar__party--remain"})
        tmp = tmp.find(attrs={"class": "eu-ref-result-bar__votes"})
        tmp = re.sub('[a-zA-z, \n]', '', tmp.text)
        votes_remain = int(tmp)
        tmp = results.find(attrs={"class": "eu-ref-result-bar__turnout"})
        tmp = re.sub('[a-zA-z, \n%:]', '', tmp.text)
        turnout_proportion = float(tmp) / 100
        print(area, votes_leave, votes_remain, turnout_proportion)
        areas.append(area)
        all_votes_leave.append(votes_leave)
        all_votes_remain.append(votes_remain)
        turnout_proportions.append(turnout_proportion)

(u'Aberdeen City', 40729, 63985, 0.679)
(u'Aberdeenshire', 62516, 76445, 0.706)
(u'Adur', 20315, 16914, 0.764)
(u'Allerdale', 31809, 22429, 0.7290000000000001)
(u'Amber Valley', 44501, 29319, 0.763)
(u'Angus', 26511, 32747, 0.68)
(u'Argyll & Bute', 19202, 29494, 0.731)
(u'Arun', 56936, 34193, 0.778)
(u'Ashfield', 46720, 20179, 0.728)
(u'Ashford', 41472, 28314, 0.7709999999999999)
(u'Aylesbury Vale', 53956, 52877, 0.784)
(u'Babergh', 29933, 25309, 0.782)
(u'Barking & Dagenham', 46130, 27750, 0.638)
(u'Barnet', 60823, 100210, 0.721)
(u'Barnsley', 83958, 38951, 0.6990000000000001)
(u'Barrow-in-Furness', 21867, 14207, 0.6779999999999999)
(u'Basildon', 67251, 30748, 0.738)
(u'Basingstoke and Deane', 52071, 48257, 0.78)
(u'Bassetlaw', 43392, 20575, 0.748)
(u'Bath & North East Somerset', 44352, 60878, 0.7709999999999999)
(u'Bedford', 44569, 41497, 0.72)
(u'Belfast East', 21918, 20728, 0.665)
(u'Belfast North', 19844, 20128, 0.575)
(u'Belfast South', 13596, 30960, 0.6759999999999999)
(u'Belfast West', 8092, 23099, 0.489)
(u'Bexley', 80886, 47603, 0.752)
(u'Birmingham', 227251, 223451, 0.637)
(u'Blaby', 33583, 22888, 0.765)
(u'Blackburn with Darwen', 36799, 28522, 0.652)
(u'Blackpool', 45146, 21781, 0.654)
(u'Blaenau Gwent', 21587, 13215, 0.6809999999999999)
(u'Bolsover', 29730, 12242, 0.723)
(u'Bolton', 80491, 57589, 0.701)
(u'Boston', 22974, 7430, 0.772)
(u'Bournemouth', 50453, 41473, 0.6920000000000001)
(u'Bracknell Forest', 35002, 29888, 0.7609999999999999)
(u'Bradford', 123913, 104575, 0.667)
(u'Braintree', 52713, 33523, 0.7659999999999999)
(u'Breckland', 47235, 26313, 0.743)
(u'Brent', 48881, 72523, 0.65)
(u'Brentwood', 27627, 19077, 0.795)
(u'Bridgend', 40622, 33723, 0.711)
(u'Brighton & Hove', 46027, 100648, 0.74)
(u'Bristol', 87418, 141027, 0.731)
(u'Broadland', 42268, 35469, 0.7829999999999999)
(u'Bromley', 90034, 92398, 0.7879999999999999)
(u'Bromsgrove', 32563, 26252, 0.7929999999999999)
(u'Broxbourne', 33706, 17166, 0.737)
(u'Broxtowe', 35754, 29672, 0.7829999999999999)
(u'Burnley', 28854, 14462, 0.672)
(u'Bury', 54674, 46354, 0.713)
(u'Caerphilly', 53295, 39178, 0.7070000000000001)
(u'Calderdale', 58975, 46950, 0.71)
(u'Cambridge', 15117, 42682, 0.722)
(u'Camden', 23838, 71295, 0.654)
(u'Cannock Chase', 36894, 16684, 0.7140000000000001)
(u'Canterbury', 41879, 40169, 0.75)
(u'Cardiff', 67816, 101788, 0.696)
(u'Carlisle', 35895, 23788, 0.745)
(u'Carmarthenshire', 55381, 47654, 0.74)
(u'Castle Point', 37691, 14154, 0.753)
(u'Central Bedfordshire', 89134, 69670, 0.778)
(u'Ceredigion', 18031, 21711, 0.7440000000000001)
(u'Charnwood', 50672, 43500, 0.7040000000000001)
(u'Chelmsford', 53249, 47545, 0.7759999999999999)
(u'Cheltenham', 28932, 37081, 0.758)
(u'Cherwell', 41168, 40668, 0.755)
(u'Cheshire East', 113163, 107962, 0.773)
(u'Cheshire West and Chester', 98082, 95455, 0.745)
(u'Chesterfield', 34478, 22946, 0.7190000000000001)
(u'Chichester', 36326, 35011, 0.778)
(u'Chiltern', 26363, 32241, 0.835)
(u'Chorley', 36098, 27417, 0.755)
(u'Christchurch', 18268, 12782, 0.7929999999999999)
(u'City of London', 1087, 3312, 0.735)
(u'Clackmannanshire', 10736, 14691, 0.672)
(u'Colchester', 51305, 44414, 0.7509999999999999)
(u'Conwy', 35357, 30147, 0.7170000000000001)
(u'Copeland', 23528, 14419, 0.7)
(u'Corby', 20611, 11470, 0.741)
(u'Cornwall', 182665, 140540, 0.77)
(u'Cotswold', 26806, 28015, 0.7979999999999999)
(u'Coventry', 85097, 67967, 0.691)
(u'Craven', 18961, 16930, 0.81)
(u'Crawley', 31447, 22388, 0.732)
(u'Croydon', 78221, 92913, 0.698)
(u'Dacorum', 43702, 42542, 0.7909999999999999)
(u'Darlington', 30994, 24172, 0.71)
(u'Dartford', 35870, 19985, 0.755)
(u'Daventry', 28938, 20443, 0.809)
(u'Denbighshire', 28117, 23955, 0.691)
(u'Derby', 69043, 51612, 0.705)
(u'Derbyshire Dales', 24095, 22633, 0.8190000000000001)
(u'Doncaster', 104260, 46922, 0.695)
(u'Dover', 40410, 24606, 0.765)
(u'Dudley', 118446, 56780, 0.7170000000000001)
(u'Dumfries & Galloway', 38803, 43864, 0.7140000000000001)
(u'Dundee City', 26697, 39688, 0.629)
(u'Durham', 153877, 113521, 0.687)
(u'Ealing', 59017, 90024, 0.7)
(u'East Antrim', 22929, 18616, 0.652)
(u'East Ayrshire', 23942, 33891, 0.629)
(u'East Cambridgeshire', 24487, 23599, 0.77)
(u'East Devon', 48040, 40743, 0.789)
(u'East Dorset', 33702, 24786, 0.813)
(u'East Dunbartonshire', 17840, 44534, 0.7509999999999999)
(u'East Hampshire', 36576, 37346, 0.816)
(u'East Hertfordshire', 42994, 42372, 0.8029999999999999)
(u'East Lindsey', 56613, 23515, 0.7490000000000001)
(u'East Londonderry', 19455, 21098, 0.599)
(u'East Lothian', 19738, 36026, 0.7170000000000001)
(u'East Northamptonshire', 30894, 21680, 0.769)
(u'East Renfrewshire', 13596, 39345, 0.7609999999999999)
(u'East Riding of Yorkshire', 120136, 78779, 0.748)
(u'East Staffordshire', 39266, 22850, 0.743)
(u'Eastbourne', 30700, 22845, 0.747)
(u'Eastleigh', 39902, 36172, 0.782)
(u'Eden', 16911, 14807, 0.757)
(u'Edinburgh, City of', 64498, 187796, 0.7290000000000001)
(u'Eilean Siar', 6671, 8232, 0.701)
(u'Elmbridge', 31162, 45841, 0.7809999999999999)
(u'Enfield', 60481, 76425, 0.69)
(u'Epping Forest', 48176, 28676, 0.768)
(u'Epsom and Ewell', 21707, 23596, 0.804)
(u'Erewash', 40739, 25791, 0.76)
(u'Exeter', 28533, 35270, 0.738)
(u'Falkirk', 34271, 44987, 0.675)
(u'Fareham', 39525, 32210, 0.7959999999999999)
(u'Fenland', 37571, 15055, 0.737)
(u'Fermanagh & South Tyrone', 19958, 28200, 0.679)
(u'Fife', 75466, 106754, 0.667)
(u'Flintshire', 48930, 37867, 0.748)
(u'Forest Heath', 18160, 9791, 0.725)
(u'Forest of Dean', 30251, 21392, 0.774)
(u'Foyle', 8905, 32064, 0.574)
(u'Fylde', 26317, 19889, 0.755)
(u'Gateshead', 58529, 44429, 0.706)
(u'Gedling', 37542, 30035, 0.765)
(u'Gibraltar', 823, 19322, 0.835)
(u'Glasgow City', 84474, 168335, 0.562)
(u'Gloucester', 37776, 26801, 0.72)
(u'Gosport', 29456, 16671, 0.735)
(u'Gravesham', 35643, 18876, 0.7490000000000001)
(u'Great Yarmouth', 35844, 14284, 0.69)
(u'Greenwich', 52117, 65248, 0.695)
(u'Guildford', 34458, 44155, 0.769)
(u'Gwynedd', 25665, 35517, 0.723)
(u'Hackney', 22868, 83398, 0.6509999999999999)
(u'Halton', 37327, 27678, 0.682)
(u'Hambleton', 29502, 25480, 0.784)
(u'Hammersmith & Fulham', 24054, 56188, 0.6990000000000001)
(u'Harborough', 27850, 27028, 0.8140000000000001)
(u'Haringey', 25855, 79991, 0.705)
(u'Harlow', 29602, 13867, 0.735)
(u'Harrogate', 46374, 48211, 0.7879999999999999)
(u'Harrow', 53183, 64042, 0.722)
(u'Hart', 27513, 30282, 0.826)
(u'Hartlepool', 32071, 14029, 0.655)
(u'Hastings', 24339, 20011, 0.716)
(u'Havant', 44047, 26582, 0.741)
(u'Havering', 96885, 42201, 0.76)
(u'Herefordshire', 64122, 44148, 0.7829999999999999)
(u'Hertsmere', 28532, 27593, 0.7659999999999999)
(u'High Peak', 27717, 27116, 0.7559999999999999)
(u'Highland', 55349, 70308, 0.716)
(u'Hillingdon', 74982, 58040, 0.6890000000000001)
(u'Hinckley & Bosworth', 39501, 25969, 0.767)
(u'Horsham', 41303, 43785, 0.816)
(u'Hounslow', 56321, 58755, 0.6970000000000001)
(u'Huntingdonshire', 54198, 45729, 0.778)
(u'Hyndburn', 26568, 13569, 0.647)
(u'Inverclyde', 14010, 24688, 0.66)
(u'Ipswich', 38655, 27698, 0.725)
(u'Isle of Anglesey', 19333, 18618, 0.738)
(u'Isle of Wight', 49173, 30207, 0.723)
(u'Isles of Scilly', 621, 803, 0.792)
(u'Islington', 25180, 76420, 0.703)
(u'Kensington and Chelsea', 17138, 37601, 0.659)
(u'Kettering', 32877, 21030, 0.764)
(u"King's Lynn & West Norfolk", 56493, 28587, 0.747)
(u'Kingston upon Hull', 76646, 36709, 0.629)
(u'Kingston upon Thames', 32737, 52533, 0.7829999999999999)
(u'Kirklees', 118755, 98485, 0.7070000000000001)
(u'Knowsley', 36558, 34345, 0.635)
(u'Lagan Valley', 25704, 22710, 0.6659999999999999)
(u'Lambeth', 30340, 111584, 0.6729999999999999)
(u'Lancaster', 37309, 35732, 0.726)
(u'Leeds', 192474, 194863, 0.713)
(u'Leicester', 67992, 70980, 0.65)
(u'Lewes', 28508, 30974, 0.778)
(u'Lewisham', 37518, 86955, 0.63)
(u'Lichfield', 37214, 26064, 0.787)
(u'Lincoln', 24992, 18902, 0.693)
(u'Liverpool', 85101, 118453, 0.64)
(u'Luton', 47773, 36708, 0.662)
(u'Maidstone', 52365, 36762, 0.76)
(u'Maldon', 24302, 14529, 0.7909999999999999)
(u'Malvern Hills', 25294, 23203, 0.805)
(u'Manchester', 79991, 121823, 0.597)
(u'Mansfield', 39927, 16417, 0.726)
(u'Medway', 88997, 49889, 0.721)
(u'Melton', 17610, 12695, 0.813)
(u'Mendip', 32028, 33427, 0.769)
(u'Merthyr Tydfil', 16291, 12574, 0.674)
(u'Merton', 37097, 63003, 0.7340000000000001)
(u'Mid Devon', 25606, 22400, 0.7929999999999999)
(u'Mid Suffolk', 33794, 27391, 0.7809999999999999)
(u'Mid Sussex', 41057, 46471, 0.807)
(u'Mid Ulster', 16799, 25612, 0.617)
(u'Middlesbrough', 40177, 21181, 0.649)
(u'Midlothian', 17251, 28217, 0.6809999999999999)
(u'Milton Keynes', 67063, 63393, 0.736)
(u'Mole Valley', 25708, 29088, 0.821)
(u'Monmouthshire', 27569, 28061, 0.777)
(u'Moray', 23992, 24114, 0.674)
(u'Neath Port Talbot', 43001, 32651, 0.715)
(u'New Forest', 64541, 47199, 0.792)
(u'Newark and Sherwood', 40516, 26571, 0.768)
(u'Newcastle upon Tyne', 63598, 65405, 0.6759999999999999)
(u'Newcastle-under-Lyme', 43457, 25477, 0.743)
(u'Newham', 49371, 55328, 0.5920000000000001)
(u'Newport', 41236, 32413, 0.7020000000000001)
(u'Newry & Armagh', 18659, 31693, 0.637)
(u'North Antrim', 30938, 18782, 0.649)
(u'North Ayrshire', 29110, 38394, 0.6459999999999999)
(u'North Devon', 33100, 24931, 0.768)
(u'North Dorset', 23802, 18399, 0.797)
(u'North Down', 21046, 23131, 0.677)
(u'North East Derbyshire', 37235, 22075, 0.752)
(u'North East Lincolnshire', 55185, 23797, 0.679)
(u'North Hertfordshire', 35438, 42234, 0.782)
(u'North Kesteven', 42183, 25570, 0.784)
(u'North Lanarkshire', 59400, 95549, 0.609)
(u'North Lincolnshire', 58915, 29947, 0.7190000000000001)
(u'North Norfolk', 37576, 26214, 0.768)
(u'North Somerset', 64976, 59572, 0.774)
(u'North Tyneside', 60589, 52873, 0.723)
(u'North Warwickshire', 25385, 12569, 0.762)
(u'North West Leicestershire', 34969, 22642, 0.779)
(u'Northampton', 61454, 43805, 0.726)
(u'Northumberland', 96699, 82022, 0.743)
(u'Norwich', 29040, 37326, 0.691)
(u'Nottingham', 61343, 59318, 0.618)
(u'Nuneaton and Bedworth', 46095, 23736, 0.743)
(u'Oadby & Wigston', 17173, 14292, 0.737)
(u'Oldham', 65369, 42034, 0.679)
(u'Orkney Islands', 4193, 7189, 0.6829999999999999)
(u'Oxford', 20913, 49424, 0.723)
(u'Pembrokeshire', 39155, 29367, 0.7440000000000001)
(u'Pendle', 28631, 16704, 0.7020000000000001)
(u'Perth and Kinross', 31614, 49641, 0.737)
(u'Peterborough', 53216, 34176, 0.723)
(u'Plymouth', 79997, 53458, 0.7140000000000001)
(u'Poole', 49707, 35741, 0.753)
(u'Portsmouth', 57336, 41384, 0.703)
(u'Powys', 42707, 36762, 0.77)
(u'Preston', 34518, 30227, 0.687)
(u'Purbeck', 16966, 11754, 0.789)
(u'Reading', 31382, 43385, 0.725)
(u'Redbridge', 59020, 69213, 0.675)
(u'Redcar & Cleveland', 48128, 24586, 0.7020000000000001)
(u'Redditch', 28579, 17303, 0.752)
(u'Reigate & Banstead', 40980, 40181, 0.782)
(u'Renfrewshire', 31010, 57119, 0.6920000000000001)
(u'Rhondda Cynon Taf', 62590, 53973, 0.674)
(u'Ribble Valley', 20550, 15892, 0.79)
(u'Richmond upon Thames', 33410, 75396, 0.82)
(u'Richmondshire', 15691, 11945, 0.7509999999999999)
(u'Rochdale', 62014, 41217, 0.659)
(u'Rochford', 34937, 17510, 0.7879999999999999)
(u'Rossendale', 23169, 15012, 0.7240000000000001)
(u'Rother', 33753, 23916, 0.7929999999999999)
(u'Rotherham', 93272, 44115, 0.695)
(u'Royal Borough of Windsor and Maidenhead', 37706, 44086, 0.797)
(u'Rugby', 33199, 25350, 0.79)
(u'Runnymede', 24035, 20259, 0.76)
(u'Rushcliffe', 29888, 40522, 0.815)
(u'Rushmoor', 28396, 20384, 0.741)
(u'Rutland', 11613, 11353, 0.7809999999999999)
(u'Ryedale', 17710, 14340, 0.772)
(u'Salford', 62385, 47430, 0.632)
(u'Sandwell', 98250, 49004, 0.665)
(u'Scarborough', 37512, 22999, 0.73)
(u'Scottish Borders', 26962, 37952, 0.7340000000000001)
(u'Sedgemoor', 41869, 26545, 0.763)
(u'Sefton', 71176, 76702, 0.7170000000000001)
(u'Selby', 30532, 21071, 0.7909999999999999)
(u'Sevenoaks', 38258, 32091, 0.8059999999999999)
(u'Sheffield', 136018, 130735, 0.6729999999999999)
(u'Shepway', 37729, 22884, 0.7490000000000001)
(u'Shetland Islands', 5315, 6907, 0.703)
(u'Shropshire', 104166, 78987, 0.773)
(u'Slough', 29631, 24911, 0.621)
(u'Solihull', 68484, 53466, 0.76)
(u'South Antrim', 22055, 21498, 0.634)
(u'South Ayrshire', 25241, 36265, 0.698)
(u'South Bucks', 20647, 20077, 0.78)
(u'South Cambridgeshire', 37061, 56128, 0.812)
(u'South Derbyshire', 34216, 22479, 0.768)
(u'South Down', 15625, 32076, 0.624)
(u'South Gloucestershire', 83405, 74928, 0.762)
(u'South Hams', 26142, 29308, 0.802)
(u'South Holland', 36423, 13074, 0.753)
(u'South Kesteven', 49424, 33047, 0.782)
(u'South Lakeland', 30800, 34531, 0.797)
(u'South Lanarkshire', 60024, 102568, 0.653)
(u'South Norfolk', 41541, 38817, 0.785)
(u'South Northamptonshire', 30771, 25853, 0.794)
(u'South Oxfordshire', 37865, 46245, 0.807)
(u'South Ribble', 37318, 26406, 0.753)
(u'South Somerset', 56940, 42527, 0.7859999999999999)
(u'South Staffordshire', 43248, 23444, 0.778)
(u'South Tyneside', 49065, 30014, 0.682)
(u'Southampton', 57927, 49738, 0.6809999999999999)
(u'Southend-on-Sea', 54522, 39348, 0.728)
(u'Southwark', 35209, 94293, 0.6609999999999999)
(u'Spelthorne', 34135, 22474, 0.779)
(u'St Albans', 32237, 54208, 0.8240000000000001)
(u'St Edmundsbury', 35224, 26986, 0.767)
(u'St Helens', 54357, 39322, 0.688)
(u'Stafford', 43386, 34098, 0.778)
(u'Staffordshire Moorlands', 38684, 21076, 0.753)
(u'Stevenage', 27126, 18659, 0.737)
(u'Stirling', 15787, 33112, 0.74)
(u'Stockport', 77930, 85559, 0.7390000000000001)
(u'Stockton-on-Tees', 61982, 38433, 0.71)
(u'Stoke-on-Trent', 81563, 36027, 0.657)
(u'Strangford', 23383, 18727, 0.645)
(u'Stratford-on-Avon', 40817, 38341, 0.8079999999999999)
(u'Stroud', 33618, 40446, 0.8)
(u'Suffolk Coastal', 41966, 37218, 0.8059999999999999)
(u'Sunderland', 82394, 51930, 0.648)
(u'Surrey Heath', 26667, 25638, 0.7979999999999999)
(u'Sutton', 57241, 49319, 0.76)
(u'Swale', 47388, 28481, 0.742)
(u'Swansea', 61936, 58307, 0.695)
(u'Swindon', 61745, 51220, 0.758)
(u'Tameside', 67829, 43118, 0.66)
(u'Tamworth', 28424, 13705, 0.741)
(u'Tandridge', 27169, 24251, 0.8029999999999999)
(u'Taunton Deane', 34789, 30944, 0.7809999999999999)
(u'Teignbridge', 44363, 37949, 0.7929999999999999)
(u'Telford & Wrekin', 56649, 32954, 0.721)
(u'Tendring', 57447, 25210, 0.7440000000000001)
(u'Test Valley', 39091, 36170, 0.7959999999999999)
(u'Tewkesbury', 28568, 25084, 0.7909999999999999)
(u'Thanet', 46037, 26065, 0.728)
(u'The Vale of Glamorgan', 35628, 36681, 0.7609999999999999)
(u'Three Rivers', 27097, 25751, 0.784)
(u'Thurrock', 57765, 22151, 0.727)
(u'Tonbridge & Malling', 41229, 32792, 0.7959999999999999)
(u'Torbay', 47889, 27935, 0.736)
(u'Torfaen', 28781, 19363, 0.698)
(u'Torridge', 25200, 16229, 0.7829999999999999)
(u'Tower Hamlets', 35224, 73011, 0.645)
(u'Trafford', 53018, 72293, 0.758)
(u'Tunbridge Wells', 29320, 35676, 0.7909999999999999)
(u'Upper Bann', 27262, 24550, 0.638)
(u'Uttlesford', 26324, 25619, 0.802)
(u'Vale of White Horse', 33192, 43462, 0.8109999999999999)
(u'Wakefield', 116165, 58877, 0.711)
(u'Walsall', 92007, 43572, 0.696)
(u'Waltham Forest', 44395, 64156, 0.6659999999999999)
(u'Wandsworth', 39421, 118463, 0.7190000000000001)
(u'Warrington', 62487, 52657, 0.733)
(u'Warwick', 33642, 47976, 0.792)
(u'Watford', 23419, 23167, 0.716)
(u'Waveney', 41290, 24356, 0.726)
(u'Waverley', 31601, 44341, 0.823)
(u'Wealden', 52808, 44084, 0.8)
(u'Wellingborough', 25679, 15462, 0.754)
(u'Welwyn Hatfield', 31060, 27550, 0.75)
(u'West Berkshire', 44977, 48300, 0.799)
(u'West Devon', 18937, 16658, 0.812)
(u'West Dorset', 33267, 31924, 0.794)
(u'West Dunbartonshire', 16426, 26794, 0.639)
(u'West Lancashire', 35323, 28546, 0.7440000000000001)
(u'West Lindsey', 33847, 20906, 0.745)
(u'West Lothian', 36948, 51560, 0.6759999999999999)
(u'West Oxfordshire', 30435, 35236, 0.797)
(u'West Somerset', 13168, 8566, 0.7909999999999999)
(u'West Tyrone', 13274, 26765, 0.618)
(u'Westminster', 24268, 53928, 0.649)
(u'Weymouth and Portland', 23352, 14903, 0.758)
(u'Wigan', 104331, 58942, 0.6920000000000001)
(u'Wiltshire', 151637, 137258, 0.7879999999999999)
(u'Winchester', 29886, 42878, 0.812)
(u'Wirral', 83069, 88931, 0.7090000000000001)
(u'Woking', 24214, 31007, 0.774)
(u'Wokingham', 42229, 55272, 0.8)
(u'Wolverhampton', 73798, 44138, 0.675)
(u'Worcester', 29114, 25125, 0.738)
(u'Worthing', 32515, 28851, 0.754)
(u'Wrexham', 41544, 28822, 0.715)
(u'Wychavon', 44201, 32188, 0.8079999999999999)
(u'Wycombe', 45529, 49261, 0.757)
(u'Wyre', 40163, 22816, 0.746)
(u'Wyre Forest', 36392, 21240, 0.74)
(u'York', 45983, 63617, 0.706)

In [3]:

# Tally up votes

total_votes_leave = sum(all_votes_leave)
total_votes_remain = sum(all_votes_remain)
percent_brexit = total_votes_leave / (total_votes_leave + total_votes_remain)

print(percent_brexit)

0.518922595696

In [4]:

# Create adjusted votes - what if there had been 100% turnout with same percentage of leave / remain in each area

adjusted_all_votes_leave = [votes / prop for votes, prop in zip(all_votes_leave, turnout_proportions)]
adjusted_all_votes_remain = [votes / prop for votes, prop in zip(all_votes_remain, turnout_proportions)]

adjusted_total_votes_leave = sum(adjusted_all_votes_leave)
adjusted_total_votes_remain = sum(adjusted_all_votes_remain)
adjusted_percent_brexit = adjusted_total_votes_leave / (adjusted_total_votes_leave + adjusted_total_votes_remain)

print(adjusted_percent_brexit)

0.517143339465



I hope this tutorial will surely help you. If you have any questions or problems please let me know.




Happy Hadooping with Patrick..

Wednesday, 22 June 2016

Python : Python Script to Update and Show "ESPNCRICINFO.COM" live Scores.

Python Script to Update and Show "ESPNCRICINFO.COM" live Scores.

A small python script for automatically updating and show espncricinfo.com scores of live cricket matches after a particular time.

import requests
	from bs4 import BeautifulSoup
	from time import sleep

	url = "http://static.espncricinfo.com/rss/livescores.xml"
	while True:
	r = requests.get(url)
	while r.status_code is not 200:
	r = requests.get(url)
	soup = BeautifulSoup(r.text)

	data = soup.find_all('description')
	score = data[1].text
	print(soup.text)

	sleep(60)

I hope this tutorial will surely help you. If you have any questions or problems please let me know.

Happy Hadooping with Patrick..

Tuesday, 21 June 2016

Python : Download Audio and Video from Youtube Python Script

Download Audio and Video from Youtube Python Script

A python application which download audio or video songs from youtube.This application uses multiple library such as BeautifulSoup for parsing purpose , "request" for making request and youtube-dl for downloading a youtube video.First of all create a file in which store list of song line by line.Run this application using python command.

Downloads :

Youtube Audio and Videos Downloader Script

I hope this tutorial will surely help you. If you have any questions or problems please let me know.

Happy Hadooping with Patrick..

Python : Data Mining/Web Crawling Script on Social Web (Facebook,Linkedin,Twitter,Flipkart)

Data Mining/Web Crawling Script on Social Web (Facebook,Linkedin,Twitter,Flipkart)

Social media is an amazing source of data just waiting to be analysed. Attached scripts provides helpful examples of how to mine a variety of data sources using Python, a powerful programming language that simplifies access to the social networking APIs.

Downloads :

Python Scripts ( Facebook,Linkedin,Twitter,Web Crawling-Flipkart)

I hope this tutorial will surely help you. If you have any questions or problems please let me know.

Happy Hadooping with Patrick..

Sunday, 19 June 2016

Python : Visualizing UN Population Projections

Visualizing UN Population Projections example

The U.N. world population prospects data set depicts the U.N.’s projections for every country’s population, decade by decade through 2100. The 2015 revision was recently released, and I analyzed, visualized, and mapped the data.

In [1]:

import pandas as pd, numpy as np, matplotlib.pyplot as plt
import matplotlib.colors as colors, matplotlib.colorbar as colorbar 
import matplotlib.cm as cm, matplotlib.font_manager as fm
from matplotlib.patches import Polygon
from matplotlib.collections import PatchCollection
from mpl_toolkits.basemap import Basemap
%matplotlib inline

In [2]:

# define the fonts to use for plots
family = 'Myriad Pro'
title_font = fm.FontProperties(family=family, style='normal', size=24, weight='normal', stretch='normal')
label_font = fm.FontProperties(family=family, style='normal', size=18, weight='normal', stretch='normal')
ticks_font = fm.FontProperties(family=family, style='normal', size=16, weight='normal', stretch='normal')
annot_font = fm.FontProperties(family=family, style='normal', size=12, weight='normal', stretch='normal')

In [3]:

def get_colors(cmap, n, start=0., stop=1., alpha=1., reverse=False):
    '''return n-length list of rgba colors from the passed colormap name and alpha,
       limit extent by start/stop values and reverse list order if flag is true'''
    import matplotlib.cm as cm, numpy as np
    colors = [cm.get_cmap(cmap)(x) for x in np.linspace(start, stop, n)]
    colors = [(r, g, b, alpha) for r, g, b, _ in colors]
    return list(reversed(colors)) if reverse else colors

First, prep the data set

The World Population Prospects dataset, 2015 revision, is by the UN Department of Social and Economic Affairs, Population Division. See here for more info on ISO country codes.

I'll look at the UN's "medium variant" projections. For even wilder numbers, you can examine the "constant fertility" projections.

In [4]:

# choose which UN population prediction to use (ie, which tab in the excel file)
variant = 'MEDIUM VARIANT'

In [5]:

# load the population projections data and rename wonky column names
df_excel = pd.read_excel('data/WPP2015_POP_F01_1_TOTAL_POPULATION_BOTH_SEXES.XLS',
                         sheetname=variant, skiprows=range(16))
df_excel = df_excel.rename(columns={'Country code':'country_code', 'Major area, region, country or area *':'country_name'})

In [6]:

# load the ISO UN country codes and prepend zeros
country_codes = pd.read_csv('data/country_names_codes.csv', encoding='utf-8')
country_codes['country_code'] = country_codes['country_code'].astype(str).str.zfill(3)

In [7]:

# filter the excel data set by only those rows that appear in the list of country codes
# this filters out continent and regional scale entities
df_excel['country_code'] = df_excel['country_code'].astype(str).str.zfill(3)
df_full = df_excel[df_excel['country_code'].isin(country_codes['country_code'])].copy()

In [8]:

# retain only a subset of columns
cols = ['country_name', 'country_code', '2015', '2020', '2030', '2040', '2050', '2060', '2070', '2080', '2090', '2100']
df = df_full[cols].copy()

In [9]:

# clean up a couple of country names
df['country_name'] = df['country_name'].str.replace('United States of America', 'USA')
df['country_name'] = df['country_name'].str.replace('Russian Federation', 'Russia')
df['country_name'] = df['country_name'].str.replace('Democratic Republic of the Congo', 'Congo')
df['country_name'] = df['country_name'].str.replace('United Republic of Tanzania', 'Tanzania')
df['country_name'] = df['country_name'].str.replace('Other non-specified areas', 'Taiwan')

In [10]:

# multiply all numeric columns by 1000 to get population in units of 1 instead of 1000
f = lambda x: x if isinstance(x, str) else x * 1000
df = df.applymap(f)

In [11]:

# select the countries in africa for further analysis
africa = df.iloc[0:58]

# then sort df by current pop and reset index
df = df.sort_values(by='2015', ascending=True, inplace=False)
df = df.reset_index(drop=True)
df.head()

Out[11]:

	country_name	country_code	2015	2020	2030	2040	2050	2060	2070	2080	2090	2100
0	Holy See	336	800.0	804.0	805.0	814.0	824.0	838.0	835.0	830.0	825.0	825.0
1	Tokelau	772	1250.0	1357.0	1449.0	1503.0	1568.0	1624.0	1632.0	1622.0	1580.0	1514.0
2	Niue	570	1610.0	1621.0	1662.0	1725.0	1767.0	1803.0	1818.0	1797.0	1765.0	1721.0
3	Falkland Islands (Malvinas)	238	2903.0	2932.0	2935.0	2901.0	2866.0	2817.0	2761.0	2727.0	2693.0	2653.0
4	Saint Helena	654	3961.0	4052.0	4224.0	4218.0	4107.0	4004.0	3901.0	3788.0	3619.0	3435.0

Next, examine the continent/supernational scale data

In [12]:

# extract the supernational regions, they are in upper-case
df_regions = df_excel[df_excel['country_name']==df_excel['country_name'].str.upper()]
df_regions = df_regions[cols].copy()
df_regions = df_regions.applymap(f)

In [13]:

df_regions = df_regions.drop(labels=0, axis=0)
df_regions['country_name'] = df_regions['country_name'].str.title()
df_regions.loc[187, 'country_name'] = 'Latin America'

In [14]:

# first re-index by area name for one line per country
df_plot = df_regions.copy()
df_plot.index = df_plot['country_name']
df_plot = df_plot.drop(['country_name'], axis=1)
df_plot = df_plot.transpose()
df_plot = df_plot.drop('country_code', axis=0)

In [15]:

color_list = get_colors('spectral', n=len(df_regions), start=0.1, stop=0.95, reverse=True)

In [16]:

fig, ax = plt.subplots(figsize=[9, 6])
lines = []
for col, c in zip(df_plot.columns, color_list):
    ax.plot(df_plot[col]/10**9, linewidth=4, alpha=0.6, marker='o', markeredgewidth=0, color=c)
    lines.append(col)
    
ax.set_title('UN Projected Population Growth of World Regions', fontproperties=label_font, y=1.01)
ax.set_xlabel('Year', fontproperties=label_font)
ax.set_ylabel('Population (Billions)', fontproperties=label_font)
ax.grid()
ax.set_axis_bgcolor('#ffffff')
ax.set_xlim([2015, 2100])
ax.legend(lines, loc='center right', bbox_to_anchor=(1.325,0.825))

fig.savefig('images/projected-pop-line-regions.png', dpi=96, bbox_inches='tight')
plt.show()

Now, plot population change as a line chart

In [17]:

# first re-index by area name for one line per country
df_plot = df.copy()
df_plot.index = df_plot['country_name']
df_plot = df_plot.drop(['country_name'], axis=1)

# only keep countries that exceed some minimum population threshold, and transpose df for line plotting
min_population = 10 * 10**6
df_plot = df_plot[df_plot['2015'] > min_population]
df_plot = df_plot.transpose()
df_plot = df_plot.drop('country_code', axis=0)

In [18]:

# get one color for each country's line
color_list = get_colors('spectral', n=len(df_plot.columns), start=0.15, stop=0.9)

In [19]:

fig, ax = plt.subplots(figsize=[10, 7])
for col, c in reversed(list(zip(df_plot.columns, color_list))):
    ax.plot(df_plot[col]/10**9, linewidth=2, alpha=0.6, marker='.', markeredgewidth=0, color=c)
    
ax.set_title('UN Projected Population Growth of Large Countries', fontproperties=label_font, y=1.01)
ax.set_xlabel('Year', fontproperties=label_font)
ax.set_ylabel('Population (Billions)', fontproperties=label_font)
ax.grid()
ax.set_axis_bgcolor('#ffffff')
ax.set_xlim([2015, 2100])

# annotate the top 5 countries
def annotate(row):
    bbox = dict(boxstyle='round', color='w', fc='w', alpha=0.0)
    name = ' {}'.format(row['country_name'])
    pop = row['2100'] / 10**9
    ax.annotate(name, xy=(2100, pop), bbox=bbox, xycoords='data', fontproperties=annot_font)
df.sort_values(by='2100', ascending=False).head().apply(annotate, axis=1)

fig.savefig('images/projected-pop-line.png', dpi=96, bbox_inches='tight')
plt.show()

In [20]:

# log the populations
df_log_plot = df_plot.applymap(np.log)

In [21]:

# plot again, this time with the log values
fig, ax = plt.subplots(figsize=[10, 7])
for col, c in reversed(list(zip(df_log_plot.columns, color_list))):
    ax.plot(df_log_plot[col], linewidth=2, alpha=0.6, marker='.', markeredgewidth=0, color=c)
    
ax.set_title('UN Projected Population Growth of Large Countries', fontproperties=label_font, y=1.01)
ax.set_xlabel('Year', fontproperties=label_font)
ax.set_ylabel('Population (Log)', fontproperties=label_font)
ax.grid()
ax.set_axis_bgcolor('#ffffff')
ax.set_xlim([2015, 2100])

# annotate the top 5 countries
def annotate(row):
    bbox = dict(boxstyle='round', color='w', fc='w', alpha=0.0)
    name = ' {}'.format(row['country_name'])
    pop = np.log(int(row['2100']))
    ax.annotate(name, xy=(2100, pop), bbox=bbox, xycoords='data', fontproperties=annot_font)
df['country_name'] = df['country_name'].str.replace('United States of America', 'USA')
df['country_name'] = df['country_name'].str.replace('Democratic Republic of the Congo', 'Congo')
df.sort_values(by='2100', ascending=False).head().apply(annotate, axis=1)

fig.savefig('images/projected-pop-log-line.png', dpi=96, bbox_inches='tight')
plt.show()

Now, visualize population change with a choropleth map

In [22]:

# calculate each country's percent change between 2015 and 2100
df['change'] = df['2100'] / df['2015']

In [23]:

# divide up the dataset into quantiles by population change
num_bins = 7
bin_labels = range(num_bins)
df['bin'] = pd.qcut(x=df['change'], q=num_bins, labels=bin_labels)

In [24]:

# create labels and ticks for the legend
def get_label(value):
    pct = int((value - 1) * 100)
    sign = '+' if pct > 0 else ''
    return '{}{:,}%'.format(sign, pct)

# the labels will be the mins of each bin and the max of the last bin
labels = [get_label(df[df['bin']==b]['change'].min()) for b in range(num_bins)]
labels.append(get_label(df[df['bin']==list(range(num_bins))[-1]]['change'].max()))
print(labels)

# add one more tick mark, for the max of the last bin
label_ticks = list(bin_labels) + [num_bins]
print(label_ticks)

['-54%', '-27%', '-5%', '+14%', '+34%', '+66%', '+198%', '+951%']
[0, 1, 2, 3, 4, 5, 6, 7]

In [25]:

# define map colors
cholorpleth_color_map = 'viridis'
water_color = '#dddddd'
border_color = '#ffffff'
no_data_color = '#ffffff'
map_boundary_color = '#999999'

In [26]:

# get a list of colors for our choropleth map
color_list = get_colors(cholorpleth_color_map, n=num_bins, start=0.0, stop=0.92, alpha=0.75)

In [27]:

# now create the map: first set up the plotting figure
shapefile = 'data/countries_110m/ne_110m_admin_0_countries'
fig = plt.figure(figsize=[14, 8])
ax = fig.add_subplot(111, axisbg=water_color, frame_on=False)
ax.set_title('UN Population Growth Projections, 2015-2100', fontproperties=title_font, y=1.01)

# draw the basemap and read the shapefile
m = Basemap(lon_0=0, projection='kav7', resolution=None)
m.drawmapboundary(color=map_boundary_color)
m.readshapefile(shapefile, name='shapes', drawbounds=False, default_encoding='ISO-8859-1')

# draw each shape, colored according to country's bin
for info, shape in zip(m.shapes_info, m.shapes):
    country_code = info['iso_n3']    
    if country_code in df['country_code'].values:
        bin_num = df[df['country_code']==country_code]['bin'].iloc[0]
        color = color_list[bin_num]
    else:
        color = no_data_color #if no data
    pc = PatchCollection([Polygon(np.array(shape), True)])
    pc.set_facecolor(color)
    pc.set_edgecolor(border_color)
    pc.set_linewidth(0.5)
    ax.add_collection(pc)

# draw color legend (lengths are ratios of figure size)
legend_width = 0.375
legend_left = (1 - legend_width) / 2.
legend_bottom = 0.07
legend_height = 0.03
legend_axis = fig.add_axes([legend_left, legend_bottom, legend_width, legend_height], zorder=2)
cmap = colors.ListedColormap(color_list)
cb = colorbar.ColorbarBase(legend_axis, cmap=cmap, ticks=label_ticks, boundaries=label_ticks, orientation='horizontal')
cb.ax.set_xticklabels(labels, fontproperties=annot_font)

fig.tight_layout()
fig.savefig('images/projected-pop-map.png', dpi=96)
plt.show()

Lastly, show some descriptive tables

In [28]:

africa_change = africa['2100'] - africa['2015']
world_change = df['2100'] - df['2015']

print('African population will grow by {:,.0f} people by 2100'.format(africa_change.sum()))
print('World population will grow by {:,.0f} people by 2100'.format(world_change.sum()))
print('That is {:,.0f} new people outside of Africa'.format(world_change.sum() - africa_change.sum()))
print('Africa accounts for {:,.1f}% of the projected growth'.format(africa_change.sum() * 100 / world_change.sum()))

African population will grow by 3,200,412,787 people by 2100
World population will grow by 3,863,845,383 people by 2100
That is 663,432,596 new people outside of Africa
Africa accounts for 82.8% of the projected growth

In [29]:

# format populations and changes nicely for display
def get_pct_str(value):
    pct = (value - 1) * 100
    sign = '+' if pct > 0 else ''
    return '{}{:,.0f}%'.format(sign, pct)

def get_change_str(value):
    sign = '+' if value > 0 else ''
    return '{}{:,.0f}'.format(sign, value)

def get_pop_str(value):
    return '{:,.0f}'.format(value)

In [30]:

change = df[['country_name', '2015', '2100', 'change']].copy()
change['% Change'] = change['change'].map(get_pct_str)
change['Pop Change'] = (change['2100'] - change['2015']).map(get_change_str)
change[['2015', '2100']] = change[['2015', '2100']].applymap(get_pop_str)
change = change.rename(columns={'country_name':'Country', '2015':'2015 Pop', '2100':'2100 Pop'})

In [31]:

# show the 10 countries projecting the greatest % population decline
change = change.sort_values(by='change', inplace=False).reset_index()
change.index = change.index.map(lambda x: x + 1)
change[['Country', '2015 Pop', '2100 Pop', 'Pop Change', '% Change']].head(10)

Out[31]:

	Country	2015 Pop	2100 Pop	Pop Change	% Change
1	Republic of Moldova	4,068,897	1,855,779	-2,213,118	-54%
2	Bulgaria	7,149,787	3,406,196	-3,743,591	-52%
3	Bosnia and Herzegovina	3,810,416	1,919,196	-1,891,220	-50%
4	Northern Mariana Islands	55,070	28,889	-26,181	-48%
5	Taiwan	23,381,038	12,518,164	-10,862,874	-46%
6	Romania	19,511,324	10,700,428	-8,810,896	-45%
7	Poland	38,611,794	22,288,717	-16,323,077	-42%
8	Ukraine	44,823,765	26,400,264	-18,423,501	-41%
9	Armenia	3,017,712	1,793,392	-1,224,320	-41%
10	Puerto Rico	3,683,238	2,212,136	-1,471,102	-40%

In [32]:

# show the 10 countries projecting the greatest % population increase
change = change.sort_values(by='change', ascending=False, inplace=False).reset_index()
change.index = change.index.map(lambda x: x + 1)
change[['Country', '2015 Pop', '2100 Pop', 'Pop Change', '% Change']].head(10)

Out[32]:

	Country	2015 Pop	2100 Pop	Pop Change	% Change
1	Niger	19,899,120	209,334,454	+189,435,334	+952%
2	Zambia	16,211,767	104,868,893	+88,657,126	+547%
3	Burundi	11,178,921	62,661,944	+51,483,023	+461%
4	Tanzania	53,470,420	299,132,889	+245,662,469	+459%
5	Angola	25,021,974	138,737,554	+113,715,580	+454%
6	Somalia	10,787,104	58,310,946	+47,523,842	+441%
7	Mali	17,599,694	92,980,533	+75,380,839	+428%
8	Uganda	39,032,383	202,867,655	+163,835,272	+420%
9	Malawi	17,215,232	87,055,526	+69,840,294	+406%
10	Congo	77,266,814	388,732,857	+311,466,043	+403%

In [33]:

# show the most populous countries in 2015
most_populous_2015 = df.sort_values(by='2015', ascending=False).reset_index().copy()
most_populous_2015.index = most_populous_2015.index.map(lambda x: x + 1)
most_populous_2015['2015'] = most_populous_2015['2015'].map(lambda x: '{:,.0f}'.format(x))
most_populous_2015 = most_populous_2015.rename(columns={'country_name':'Country', '2015':'2015 Pop'})
most_populous_2015[['Country', '2015 Pop']].head(10)

Out[33]:

	Country	2015 Pop
1	China	1,376,048,943
2	India	1,311,050,527
3	USA	321,773,631
4	Indonesia	257,563,815
5	Brazil	207,847,528
6	Pakistan	188,924,874
7	Nigeria	182,201,962
8	Bangladesh	160,995,642
9	Russia	143,456,918
10	Mexico	127,017,224

In [34]:

# show the most populous countries in 2100
most_populous_2100 = df.sort_values(by='2100', ascending=False).reset_index().copy()
most_populous_2100.index = most_populous_2100.index.map(lambda x: x + 1)
most_populous_2100['2100'] = most_populous_2100['2100'].map(lambda x: '{:,.0f}'.format(x))
most_populous_2100 = most_populous_2100.rename(columns={'country_name':'Country', '2100':'2100 Pop'})
most_populous_2100[['Country', '2100 Pop']].head(10)

Out[34]:

	Country	2100 Pop
1	India	1,659,785,948
2	China	1,004,391,965
3	Nigeria	752,247,359
4	USA	450,384,823
5	Congo	388,732,857
6	Pakistan	364,282,652
7	Indonesia	313,648,122
8	Tanzania	299,132,889
9	Ethiopia	242,644,125
10	Niger	209,334,454

In [35]:

# lastly, make a table for the regions
regions_change = df_regions[['country_name', '2015', '2100']].copy()
regions_change['change'] = regions_change['2100'] / regions_change['2015']
regions_change['pop_change'] = regions_change['2100'] - regions_change['2015']

regions_change['% Change'] = regions_change['change'].map(get_pct_str)
regions_change['Pop Change'] = regions_change['pop_change'].map(get_change_str)
regions_change = regions_change.sort_values(by='pop_change', inplace=False, ascending=False).reset_index()

regions_change.index = regions_change.index.map(lambda x: x + 1)
regions_change[['2015', '2100']] = regions_change[['2015', '2100']].applymap(get_pop_str)
regions_change = regions_change.rename(columns={'country_name':'Region', '2015':'2015 Pop', '2100':'2100 Pop'})
regions_change[['Region', '2015 Pop', '2100 Pop', 'Pop Change', '% Change']]

Out[35]:

	Region	2015 Pop	2100 Pop	Pop Change	% Change
1	Africa	1,186,178,282	4,386,591,069	+3,200,412,787	+270%
2	Asia	4,393,296,014	4,888,652,982	+495,356,968	+11%
3	Northern America	357,838,036	500,143,198	+142,305,162	+40%
4	Latin America	634,386,567	721,224,187	+86,837,620	+14%
5	Oceania	39,331,130	71,128,695	+31,797,565	+81%
6	Europe	738,442,070	645,577,351	-92,864,719	-13%

I hope this tutorial will surely help you. If you have any questions or problems please let me know.

Happy Hadopping with Patrick..

Pages

Monday, 29 August 2016

Tuesday, 23 August 2016

Saturday, 20 August 2016

Saturday, 30 July 2016

Problems

Wednesday, 13 July 2016

Saturday, 25 June 2016

Wednesday, 22 June 2016

Tuesday, 21 June 2016

Sunday, 19 June 2016

First, prep the data set

Next, examine the continent/supernational scale data

Now, plot population change as a line chart

Now, visualize population change with a choropleth map

Lastly, show some descriptive tables