Writing

Chicago Lobbyists Force-Directed Graph Visualization

I wanted to build a project using a government dataset to learn d3.js. I decided to use the Chicago lobbyist data after looking at the information available at ChicagoLobbyists.org.

Now, to help you understand how to create your first visualization, I’ll share how I created this project.

Picking the Chart Type

There are a lot of excellent examples of the visualizations you can create using d3.js. Overall I wanted to display the relations between lobbies, clients, and agencies and the money associated with them. After evaluating all the available examples, I decided to use a force layout for this project. Next, I learned about the JSON data required for the layout after reading all the really well written documentation.

Generating the JSON Data

I used a local copy of the ChicagoLobbyists.org Sinatra app to generate the JSON need to create the visualization. Of course, the following is not very efficient with its N+1 queries, but I wanted the quickest and most straight forward solution since I wasn’t very familiar with DataMapper and would only need to extract the data once.

Learning and Using d3.js

D3.js has really good documentation. This really helped me get started with the library and get accustomed to the way you should use it to build your visualizations. I was able to create a prototype of visualization rather quickly. The first issue was dealing with the sheer number of nodes being created to represent all the lobbyists, clients, and agencies. I settled on just showing the top 50 paid lobbyists, but that still left me with a few nodes to show that were taking up a lot more vertical and horizontal space than I wanted. I address this by setting the layout’s charge to very repulsive and then setting the gravity to pull the nodes into the center. Unfortunately having these values set this way causes the node to bounce around when dragging is enabled you try to hover over a node. This seems like a bug and I plan on looking into what is causing that behavior.

Adjusting the charge and gravity that way organized the nodes into a circle and did a good job of spacing out the nodes and making the visualization take up less space. The problem was now that nodes were overlapping since certain nodes were larger than others. To solve this, I used the collision example from Mike Bostock’s presentation slides. Nodes no longer overlapped and the visualization was at a proper height and width.

Now, I had to make the connections of the individual nodes more visible and help the user to understand the data. I decided to do this by fading out all the nodes and links that are not connected to the hovered node. I struggled with how to efficiently traverse the nodes and adjust the opacity of the ones I wanted to fade out. I asked a question about the topic on stackoverflow about it and the creator of d3.js was kind enough to write a really helpful response that answered my question and gave me a better understanding of how to use the library.

The process of rendering all the starting frames while the graph reaches equilibrium was taking a lot of time. I found out that skipping the first few frames sped up the time to reach equilibrium and removed the distracting animation of the nodes bouncing all over the place. Initially, I was just skipping the first few frames, but I changed it to displaying a loading animation until the e.alpha cooling parameter reached a certain threshold.

Label Placement

Next, I had to decide how I would label the selected nodes. I toyed with a lot of ideas including force-based label placement, static placement of labels next to their node, and then placing the labels in a static position on the SVG and having a line point to the label it represents. Those solutions ended up being far too noisy and distracting. This is because there are too many nodes to put any sort of text or more lines next to each one. Finally, I settled on putting the labels in the top left corner and not having any lines or numbered labels to connect them to their nodes. Just color coding the node types and allowing the user to use the arrow keys to scroll through the nodes was an effective solution that didn’t clutter the screen more than was necessary.

Colors

Initially, the visualization had some really bad color choices. I tried to follow the red and blue color scheme used by ChicagoLobbyists.org, but there wasn’t enough color variance and contrast so it was difficult to see the contrast between nodes. I used ColorBrewer 2.0 to find a color scheme that would work for this and I was happy with the results.

Conclusion

Thanks to bl.ocks.org, I was able to quickly put the code on gist.github.com and share it with others. I am really happy with the overall results and the reception I got. I look forward to building more projects with d3.js and other libraries.

View the Visualization

Force Ubuntu One to Rescan and Synchronize Files and Folders

I like using Ubuntu One for file synchronization because it offers 5GB of space for free. I ran into a few gotchas where Ubuntu One wouldn’t start synchronizing new files and folders and the output of u1sdtool --waiting was blank. The solution to this was to tell Ubuntu One to rescan the folder by running these commands at the terminal:

> u1sdtool --list-folders
Folder list:
id=x-x-x-x-x subscribed=True path=/path/to/folder

Then:
> u1sdtool --rescan-from-scratch=x-x-x-x-x

After telling Ubuntu One to rescan the folder that isn’t synchronizing, it will start to upload all the files and folders in that folder. This also helps when there are file conflicts or the u1sdtool --waiting or u1sdtool --current-transfers is empty and there are files or folders on your computer that haven’t been synchronized.

How I built FOIAshare

This all started back in June was when I read that Gabe Klein would be the new commissioner for the Chicago Department of Transportation. I’ve been a year round bicyclist in the city for a few years now, and it was great to read about all of the improvements that were planned for the city. That lead me to finally setup my Twitter account to follow city leadership and fellow developers. Shortly after, I heard about the Apps for Metro Chicago, Illinois competition. The commitment to open data and the potential for innovation for the city got me excited about the competition and I wanted to participate. I went to hacksalons, Open Government Meetups, and got to interact with a lot of smart people. At the time, I wasn’t able to put together an entry for the transportation or community round, but at the start of November, I made the time and a commitment to build an app for the Grand Challenge Round.

I started off by investigating the City of Chicago Data Portal, mapping, visualization, and data mining tools. I found the FOIA request logs interesting since they were regularly updated, had a substantial amount of records, and were uniquely made available in bulk by Chicago. I initially wanted to build a scraper to fetch the data, but Christopher Groskopf already had a great FOIA Firehose ScraperWiki scraper built to pull in all of the request logs from the data portal, but it was broken at the time. I fixed the errors, made a few adjustments, and started refining the data. Eventually, I wanted bigger changes to the existing scraper, but I didn’t want to completely change it from its original intent. So, I created a local scraper using the ScraperWiki development tools and a SQLite database. I got the scraper to fetch data from as many FOIA request logs as I could find. A lot of work was spent scraping and cleaning the data, but that is something that can still be improved. Currently, the data is scraped with Python, refined with Google Refine, and imported using Ruby. This can be updated to automatically fetch, refine, and import the data using Ruby. I left that task till after the deadline since I could have spent a week getting the application to import automatically and near perfectly, or spend that time to actually, you know, build the app. Luckily, I picked the latter.

Now that I had the data, it was time to start building the app. My first git commit of a default Rails 3.1 app with PostgreSQL support was on November 19th, 2011, two weeks before the competition deadline. Shortly after, I added haml, will_paginate, Twitter Bootstrap with bootstrap-sass, and friendly_id. Then, I started laying out the urls, scaffolds, and visualizations. During those two weeks, I worked the majority of my waking hours and I actually had a substantial portion of the site built in a week. Every day I would lay out what I could accomplish in the time before the deadline. I focused on what was possible, clarifying the goals, doing the most with the least amount of code, getting it to work now, and ruthlessly cutting features that could compromise delivery.

Originally, I planned on deploying to a free Heroku instance, but the database was bigger than their 5mb limit. I already had a Linode with another Rails app on it sitting around, so I decided I would be better off deploying to that machine. That process took much more time than I expected and in retrospect, I may have been better off paying for the larger shared database at Heroku, but I didn’t want to pay for a service that I already had. After analyzing a month’s worth of traffic for FOIAshare, I will be able to make a more informed decision about hosting.

Building FOIAshare has been an exciting journey. There are still a lot of possibilities for its future and I look forward to seeing the response the application gets through the competition. Regardless of the outcome, I know that I have learned a lot more about government and software development. Overall, I am proud that I was able to take on and deliver such an ambitious personal project.

Random Hacks of Kindness Milwaukee

At my last Toastmasters meeting, a friend invited me to Random Hacks of Kindness in Milwaukee. I haven’t been to Milwaukee or a hackathon in sometime, so I was excited to visit the city and meet other developers.

I met my friend downtown at 6:30am, but since I was so preoccupied the night before with submitting my Apps for Metro Chicago, IL entry before the midnight deadline, I overlooked the fact that I had to buy a travel ticket in advance. I managed to get a ticket at the last-minute and was on my way with about 5 hours of sleep under my belt. Half way through the trip my friend asked me if I brought my ticket to the event. I didn’t even register! I was clearly flying by the seat of my pants on this one. At least I had my trusty T61 laptop.

We arrive and everything looks familiar. We get to the Grand Avenue Mall an hour before things start and I start remembering the times I would walk through that mall on my way to the Badger Bus (it was a good way to avoid the cold for a few blocks). The event starts, I fill out an impromptu name tag and everything goes great. I meet a lot of people and hear some talks on development cycles and mobile development. It was a fun way to start the day.

After the talks, the event continued at Bucketworks. I met up with my friends in the mall and we decided to walk from the Grand Avenue Mall to Bucketworks. It was cold, windy, and wet that day and, about half way, we realized that we didn’t see anyone else walking behind or ahead of us. “There must have been a shuttle”, “people probably drove”, and “where is this place?” I mentioned. Luckily, the thought of free Jimmy Johns kept us going and we eventually arrived, looking a bit worn, to a group of people who clearly did not walk from the mall. “You don’t build character in a shuttle” and “food tastes that much better after a journey like that” I quipped. Crazy Chicagoans.

After we settled in and dried off, the presentations began. I really enjoyed hearing people discuss the water problems they had. Any one of them was a possible candidate for hacking. I was overwhelmed by the amount of topics that were available to work on and I wasn’t able to decide on just one. I decided to walk around and listen to more presentations. That brought me to the table of Jessie from Sweetwater Farms. He talked about aquaponics, a subject I have never heard of before, but was very interested in. There were two other developers there, one who knew PHP and the other was a Django/Python developer. With my Ruby on Rails/Ruby knowledge to the mix, we had all the web development programming language bases covered. Unfortunately, that prevented us from deciding on what to build the project in and no consensus was made about what or how to create the application. I wish I could have helped more, but ultimately, I wasn’t a good fit for that team. I walked around to see what other teams were working on, and I heard a fantastic tutorial about the basics of beekeeping from the gentleman who makes Beepods. I never expected that I would learn so much about bees that day.

Later, I started working on data from the Milwaukee Metropolitan Sewerage District. I heard that they were doing live water quality tracking and I was interested in the data. I eventually created a ScraperWiki scraper from the data sources I found and the plan was to create a visualization using that data. I wanted to use D3, but I didn’t have enough experience to build a visualization with it in such a short amount of time and on such little sleep. I saved the work and took a two-hour nap on the floor which was mostly me just closing my eyes while the commotion from the bar across the street kept me up (people were literally being carried down the street). After waking up, I continued to research D3, but wasn’t able to create a visualization before the deadline.

Since there were only a few options for travel coming home, we ended up taking an early bus which had us leave before the presentations. I wish we didn’t have to miss the presentations, but it was the only way we could get back at a reasonable time for a reasonable amount of money. I slept on the bus the whole way back. After getting on the CTA, I had a great conversation with a fellow passenger and we talked all the way from the Quincy to the Diversey stop. In a bizarre coincidence, we eventually find out that they know someone that was at the hackathon. Small world.

After all of that, I definitely learned a few things for next time. Here is some advice for your next hackathon.

  1. Buy and reserve your travel and event tickets far in advance
  2. Research the problem domains and brainstorm possible hacks beforehand
  3. Get as much sleep as you can the night before
  4. Sleep during the event, don’t try to stay up all night
  5. Find a team that can benefit from your expertise, change teams if necessary
  6. Don’t try to learn an entirely new tool
  7. Bring something to sleep on if you’re going to stay the night
  8. Focus on doing the most with the least amount of code
  9. Bring food
  10. Take breaks (Pomodoro Technique)
  11. Make friends

Overall, it was a fun weekend and I am glad I met everyone there. I look forward to my next hacking event and thank you to Bucketworks and all the sponsors for hosting us and putting on a great event.

Save ScraperWiki Data to Local SQLite Database

Update: https://github.com/christophermanning/scraperwiki_local_python is now available and makes it easier to setup your local ScraperWiki environment.


ScraperWiki is great for aggregating data and making it publicly available. When developing a script locally, you can use https://github.com/onyxfish/fakerwiki to simulate saving to the database, but it doesn’t doesn’t actually write to a local SQLite database.

Luckily, you can rewire your local copy of the Python development library for ScraperWiki to create and save to a local SQLite database with Three Easy Steps™.

First, overwrite scraperlibs/python/scraperwiki/datastore.py with:

Second, copy https://bitbucket.org/ScraperWiki/scraperwiki/src/dd2217221fc3/services/datastore/datalib.py to scraperlibs/python/scraperwiki/datalib.py

Finally, replace line 7 of scraperlibs/python/scraperwiki/__init__.py with:
import os
logfd = os.fdopen(1, 'w', 0)

Now, running scraperwiki.sqlite.save() will create a local database at working_directory/script_name/defaultdb.sqlite with the database name swdata.

If after setting this up you notice your script is spending a lot of time writing to the database, check how many times you are calling scraperwiki.sqlite.save. Each time you call scraperwiki.sqlite.save it commits the data to SQLite which has its share of overhead and if you are calling that once for each row in large file, it will take a lot longer than necessary.

Instead, you can pass a list of dictionaries to scraperwiki.sqlite.save to significantly reduce the time spent saving your data to the database. This approach works great in a local environment, but you have to consider the amount of memory that is being used when creating this large list of dictionaries since you have limited resources on ScraperWiki. There, it may make sense to call scraperwiki.sqlite.save once per row or to send many batches of records instead of one large batch which may exceed your resources and cause a SIGKILL (EXECUTIONSTATUS: [Run was interrupted]) to appear in your run history.