Save ScraperWiki Data to Local SQLite Database

Update: is now available and makes it easier to set up your local ScraperWiki environment.

ScraperWiki is great for aggregating data and making it publicly available. When developing a script locally, you can use simulate saving to the database, but it doesn't actually write to a local SQLite database.

Luckily, you can rewire your local copy of the Python development library for ScraperWiki to create and save to a local SQLite database with Three Easy Steps™.

First, overwrite scraperlibs/python/scraperwiki/ with:

400: Invalid request

Second, copy to scraperlibs/python/scraperwiki/

Finally, replace line 7 of scraperlibs/python/scraperwiki/ with: import os logfd = os.fdopen(1, 'w', 0)

Now, running will create a local database at working_directory/script_name/defaultdb.sqlite with the database name swdata.

If after setting this up you notice your script is spending a lot of time writing to the database, check how many times you are calling Each time you call it commits the data to SQLite which has its share of overhead and if you are calling that once for each row in large file, it will take a lot longer than necessary.

Instead, you can pass a list of dictionaries to to significantly cut the time spent saving your data to the database. This approach works great in a local environment, but you have to consider the amount of memory that is being used when creating this large list of dictionaries since you have limited resources on ScraperWiki. There, it may make sense to call once per row or to send many batches of records instead of one large batch which may exceed your resources and cause a SIGKILL (EXECUTIONSTATUS: [Run was interrupted]) to appear in your run history.


November 9, 2011