Wednesday, March 6, 2013

Where are Appalachian State Students From?

I'm excited to present this to you as this marks my first adventure into big data on the web. I found Tableau software early last week and was eager to use it, so I went out about searching the web for some interesting data to visualize.

Being from the Boone area, I decided that I would check for ASU data first, which brought me upon the ASU student and faculty directory. When I saw the address field I immediately thought, "I wonder where all of these ASU students came from?" Well, a quick search on Google produced no real results. I knew then I had found my interesting piece of data.

After working with the search page and finding a way to capture all 17622 student listings, I wrote a script using Google's built in Java console that painstakingly scraped the the table located on each page. I must admit, I'm just now getting into JavaScript and JQuery (Thanks, CodeCademy!) and the process took me 4 days and probably about a total of 20 hours - for just about 40 lines of code.

But it did exactly what I needed it to do. So I factored in a 5 second delay as to not overwhelm the server, and paused every 50 pages or so to copy and paste the array it built into an excel workbook. That whole process took another 4 hours alone.

After I had all 174622 records, I organized, cleaned and filtered the data and that brought it down to the nice, even number of 17400 records - of which 14104 actually had address information.

And, unfortunately, I found a huge setback. That address field was not an actual zipcode, it was merely a record number. Why would they even provide that as public data?!

Without the zipcode data, I was pretty much sunk, that is, until I realized that Tableau can record geographic location by telephone area code!

Well, the list of student phone numbers within the SearchASU database is much, much smaller than the number of address records - there are only 803 of them, or roughly 5%, it's not great, but it's good enough to provide a decent sample size.

So without further ado, this is where Appalachian State Students are from in Spring 2013.





You can also view them from Tableau here.

Now, here are a few takeaways I got from this data:


  • Since this is area code information, it is likely that a not-so-small portion of the student population have changed their numbers to an 828 area code within the 4 or so years that they live here, but I'm sure just as many, if not more are residents of the area.
  • Likewise, foreign students probably have a local area code if they are listed at all.
  • As one can expect, all NC area codes have the highest totals on the chart. Surrounding states make up the rest of the top 10.
  • We have at least one person from Alaska!
  • This was definitely a fun little project and I'm hoping that I can get a hold of some more generic demographic data soon!
If you have any interesting demographic data that you'd like to present, shoot me an email and I'll see what I can do.


Til next time!

No comments:

Post a Comment