A Day in the Life of a SocialCops Data Analyst

My placement at the technology startup SocialCops has given me exciting insights into the ways that organizations are using data to inform better decision-making. As part of the Data Team, I work with Decision Scientists who utilize primary and secondary data to create a data stack that is utilized by stakeholders to make better decisions. I was recently featured on the SocialCops blog where I describe a day in the life of a Data Analyst at SocialCops. Check out the full version of my post here:

A Day in the Life of a SocialCops Data Analyst

As a Data Analyst at SocialCops, every day I am motivated by three long-term goals: filling data gaps, building transparency in the way governments and organizations operate and, most importantly, making our data stack accessible to stakeholders so they can use data to make better decisions.

Filling the Data Gaps

On a daily basis, working towards our goal of powering better decisions involves extensive data collection, data cleaning, and product development of data wrangling tools. The Data Team collects and processes approximately 5,000 secondary data files per week spanning a variety of sectors, including health, agriculture, economics, demographics, and education.

We collect, clean and present data from these sectors in user-friendly formats that will soon be available through our open data platform.

Data Processing

Making secondary data available in a user-friendly format is a necessary step towards data transparency, accessibility and utilization.
data analyst
Data sets available in PDF or image formats require several intermediate steps of data processing before data can be analysed. For example, one data set on District Domestic Product from 1999-2009 published by the Planning Commission is stored in 197 separate PDF files.

Other data sets are stored online and are difficult to retrieve all at once. For example, to get Gram Panchayat-level data from NREGA’s public data portal, our engineering team scraped around 40,000 separate files. These then had to be cleaned and appended through automated processes.

The bottom line – any stakeholder wishing to use these data sets would first need to figure out how to convert them to a final tabular format before performing an analysis. In the end, they may not use the data because it is inaccessible and, as a result, insights that could be afforded by the data would be lost.

Data Sourcing

In addition to formatting challenges, secondary data is difficult to obtain at the district level.

For example, major data sets such as National Sample Surveys are performed at the district level. However, their findings are reported by the operating agencies (in PDF format) at the state or regional level. Unit level data must be purchased, extracted, and analysed by someone with a data background before district-level analyses can be performed, provided the sample size is adequate for a district-level analysis.

State-level data is inadequate, especially for stakeholders looking to make decisions about resource allocation and development planning. By making district-level data available, we empower leaders to incorporate data into their decisions.

Data Cleaning

Despite the buzz about “big data” and its benefits for governance and development, few holistic solutions exist for data cleaning.

We utilize a variety of data wrangling tools including R, Excel, STATA, Python, Tabula, and Smallpdf. No one tool on its own provides a single solution to our data cleaning challenges. We often cycle through several software programs in the process of cleaning a data set.

For example, we may download a batch of PDF files using the DownThemAll! Mozilla Firefox add-on, convert them to CSV files using Tabula, read the CSV files into R to automate the cleaning process, export these files from R to Excel, then add relevant metadata before saving the final file.

Developing a Better Product

Unlike at other organisations, Data Analysts at SocialCops work with engineers on developing products to solve our data processing challenges in real time.

Our Engineering Team is developing an in-house data cleaning software that is designed to make data wrangling possible for analysts with all levels of expertise. In addition to cleaning data on already-available software, we also do data wrangling on our own platform. Working on our team’s platform lets us provide inputs that lead to almost instant solutions for our data cleaning challenges.

By working with the engineering team, we are coming closer to developing a holistic solution for data wrangling challenges, which will increase decision-making power through analysis in the future.

Building Intelligent Indices

Once data sets are cleaned and available at the district level, our data team creates indices that rank districts according to performance across sectors. We collaborate with sector experts to derive insights from our data stack and construct relevant indicators that form the basis of our indices.

For example, this month we worked with an agriculture expert to design an index that measures performance of districts across a holistic range of indicators related to productivity, assets, gender equality, and health in the agriculture sector.

data analyst

We are developing a methodology for constructing indices that can both be duplicated and executed within our data wrangling software to make index creation part of our decision-making platform.

Sharing Our Learningsdata analyst

While our data stack is not yet available through our open data platform (look for a launch soon!), we work closely with the Growth, Design and Viz teams to make our data insights publicly available through infographics, maps, and other data visualizations such as web dashboards.

For example, our agriculture data stack is available on a dashboard that enables the user to visualize district comparisons on a map using indicators from a variety of sectors including Crop Productivity, Agricultural Inputs, Nutrition, and Women’s Empowerment. We also file approximately 50 Right to Information applications (RTIs) per month and publish the results in an effort to fill gaps where data is incomplete or unavailable.

Growing My Learnings

As a Data Analyst at SocialCops, I am part of an organization that actively encourages learning and knowledge acquisition. My colleague from the Data Team leads bi-weekly sessions on R Programming, I attend weekly “Teach on Thursday” sessions where SocialCops team members teach concepts and ideas that help me grow professionally, and I am encouraged to take the time to learn new programs that will improve and scale my data analysis skills in the future.

data analyst

As our data stack grows and our methodology for creating indices evolves, we also develop as professionals. We are encouraged to learn as we go and share knowledge across teams to reach our collective goal of powering India’s most important decisions.

One of the most exciting parts of my fellowship is the opportunity to learn new skills that are useful to decision-makers at all levels from Gram Panchayats to Anganwadi Workers to Central Government Officials. If you are looking for insights about data collection, cleaning, analysis, or visualization, please feel free to reach out to me. Data should be accessible to all stakeholders and we are happy to provide suggestions to those looking to incorporate data into their decision-making processes.

Through transformative international immersion experiences in Ghana and India, Lilianna became enthusiastic about the role of research to inform social initiatives in the developing world. In the summer of 2013 she interned with the anti-child trafficking organization Challenging Heights in Winneba, Ghana and spent the following year pursuing academic coursework and Hindi language instruction in India. She took courses on Indian Political Economy and Hindi in Pune, Maharashtra, interned with the education organization Akanksha Foundation, and lived with a host family. While interning with Akanksha Foundation, Lilianna conducted a policy review of her school's educational methodology through stakeholder interviews and classroom observation. Before returning to the United States to complete her bachelor's degree, she interned in Mumbai, Maharashtra with the corporation Infrastructure Leasing and Financial Services, Ltd. where she was a member of the Social Inclusion Group, a corporate social responsibility team. Lilianna de- veloped multiple intervention plans for the corporation, the most notable of which included a campaign to assist laborers in the trucking industry. Upon returning from India, Lilianna further explored her interests in corporate social responsibility and the informal economy through independent studies at Grinnell College, utilizing Geographic Information Systems to visually illustrate her analysis of informal labor activity. Her research was selected for presentation at the Central States Anthropology Society Conference and the Grinnell College Student Research Symposium in the spring of 2015.

You Might Also Like

Leave a Reply

Your email address will not be published. Required fields are marked *

Join Us

Stay up to date on the latest news and help spread the word.

Get Involved

Our regional chapters let you bring the AIF community offline. Meet up and be a part of a chapter near you.

Join a Chapter
Skip to content