Data Creating¶
Creating your own data allows you to better control many of the previously mentioned considerations. It can also allow for greater variety of possible projects, as you do not have to limit your work by the data others have already collected. In fact, one great/easy place to start can be to simply create data on yourself or your experiences. Small personal data collection can serve as a means of diary-keeping or can simply be an exploratory method requiring little or no technical knowledge.
If you are interested in telling personal data stories, Giorgia Lupi and Stefanie Posavec’s “Dear Data” project has provided inspiration for many. The main page of the project site also includes various videos that explain how they developed their visual language and collected data by hand.
Data artist R. Luke DuBois’ “Insightful Human Portraits Made from Data” TedTalk offers another great example of the benefits of not relying on previously existing datasets. The projects he presents use a range of technical and creative methods for data sourcing that might inspire you to scrape data yourself.
And, Mimi Onuoha’s 2016 artwork “The Library of Missing Datasets” powerfully speaks to the fact that while numerous datasets do exist, there are many important “missing datasets”: information that is not collected or engaged. Inspired by Onuoha’s work, perhaps you can create a data visualization on important narratives you find to be “missing” as well.
This section will offer a few basics on creating your own data. It doesn’t attempt to cover all possible ways forward, but offers a few options for you to test things out.
Making a Dataset¶
Here are a few beginner-to-intermediate ways forward for making a dataset:
Before you get going, a first recommendation would be to quickly read Caroline Sinders’ Feminist Data Set Toolkit, particularly “Chapter 2: Thinking through Feminist Data Collection and Creation.”
One suggestion would be to now refer back to “A Brief Introduction to Working with Data and Spreadsheets” for any any guidance on spreadsheets generally, and then to make a simple individually collected table of personal data inspired by “Dear Data.” Having a dataset that is very basic, and that you understand because you made it, can help you start to think about what that data can show. If you are working with a coded visualization that you are unfamiliar with, this process will also allow you to know if you “got it right” very quickly, due to its simplicity.
Working with surveys is another popular option for creating datasets. Here is one tutorial that may help you in that process: “Creating and Analyzing Data from a Survey” on Monkey Learn.
Web-Scraping¶
Web-scraping is a very popular way to get data. It (very roughly) refers to using code to pull information from the web to use as data. There are various ways to do this! But, here are a few resources to get you going:
Sam Lavigne’s tutorial site Scrapism is a great place to start. It covers the basics of working with the command line, Python, and the Beautiful Soup library.
The “Web-scraping with Beautiful Soup” tutorial by Programming Historian offers another lesson on Beautiful Soup for those who might want to reference more than one guide.
And, the “Web-scraping a Table with GoogleDocs” on Eagereyes might be useful specifically for tables.
Using APIs¶
Besides web-scraping generally, APIs allow you to access data from web apps (for example Twitter, The New York Times, Rotten Tomatoes, & etc.) that you can then use in your project.
Here are a few beginner-to-intermediate resources on APIs:
If you are unfamailar with APIs generally, you might first read “What Exactly is an API” by Perry Eising for a very brief intro.
The “Web APIs for Non-Programmers” post on School of Data is a good next stop for more background and practical information.
The Documenting the Now project is an excellent resource for working with social media data (esp. Twitter). Their twarc tool in particular is worth checking out.
Also, useful might be the “Reshaping JSON” lesson on Programming Historian.
Read Further¶
Giorgia Lupi and Stefanie Posavec’s book Observe, Collect, Draw: Discover the Patterns in Your Everyday Life is recommended if you can find it at a library. It offers a visual guide for creating personal datasets—filled, with prompts and activities; however, there are many great Lupi and Posavec resources (articles and videos) available for free online that can offer “the gist” of their work/method if the book is not available for you.
Many of Sam Lavigne’s artworks involve web scraping and, if this is a space you are interested in, can undoubtedly inspire your own data (art)work.
Programming Historian has many other lessons that might be of interest not only for data creation, but also for other tech skills you may be interested in developing. Note: the lessons on this site are aimed at a humanities audience in particular.
The “Data Acquisition for Beginners” post by Tactical Tech (part of their “Exposing the Invisible Kit”) is another useful and thorough resource for getting data.