Monday, October 14, 2013

Big Data and Migration- What's in Store?

Unless you've been assiduously avoiding the internet the last few years, you've probably heard the term "Big Data" thrown around and witnessed the breathless speculation that generally accompanies its discussion. Interestingly, I haven't yet seen anyone talking about what the implications of the imminent big data revolution will be on migration studies, an area which is defined by a distinct lack of data. By way of starting a conversation, allow me to offer my predictions on what the impact will likely be.

I posit two main impacts of big data on migration studies/policies/ politics. One positive, one negative.

Positive: 1. It will allow scholars, NGOs and activists to track flows of migration like never before, making humanitarian interventions easier, allowing us to fight back against fear-mongering false statistics in the media, and providing new ways of preventing statelessness and human trafficking.

Negative: 2. It will allow governments to track undocumented migrants with an unheard of ease, prevent refugee flows from entering their countries, and track remittances and travel in ways that put migrants at new risks.

In short, this proliferation of knowledge could easily cut in both ways. In the following, I will describe the above scenarios in greater detail, and offer my opinions as to how we can (attempt to) avoid the downsides of the impact of big data on migration.

How Big Data Provides Information about Migration

In Big Data: A Revolution That Will Transform How We Live Work and Think, authors Viktor Mayer-Schonburger and Kenneth Cukier describe how a proliferation of data, thanks largely to the internet, has made new advances in prediction and analysis possible (and exploitable by those quick enough to grasp how to use it.) For example, Google, by analysing search data from past flu outbreaks, is able to make predictions about the next flu outbreaks with more accuracy than the CDC could ever have previously dreamt. Facebook uses your likes and interactions (along with other metrics) to guess what products you might be most interested in, just like Amazon, compiling the buying history of millions of customers, is able to make a much better guess about what you are going to buy than your book club.

The point is, having access to massive amounts of data can often be a more accurate predictor of behavior than the traditional polls or surveys, which rely on random samples or other means. The power of big data to predict is far from being totally harnessed, but as more people use the internet and offer up information about themselves, their interests, moods, and consumer behavior, the analysis becomes more accurate.

So how exactly does this relate to migration data? In several ways:
  • Self-provided information: Ever changed your location on facebook or twitter to reflect a move? Tagged pictures from your European vacation with each country and city you visited? Signed into FourSquare?
  • Passively Collected Information: As you've probably realized post-Snowden, information about flights, wired money transfers, international emails and text messages, and even GPS locations are stored, and available to some for analysis. 
  • Searches: If you have auto-complete turned on for google, take a look at popular migration-related search chains. "Moving to San Francisco", "USA immigration requirements" "EU asylum lawyers"- all of these could be indicators of intent to migrate or move. This information is also collected and saved.
By using combinations of these data types, it boggles the mind what might be possible. For example, why not compare official state statistics of individuals from say, Romania, registered in say, Berlin. Then compare this against facebook profiles that have hometowns in Romania and current locations in Berlin to see who isn't mentioned.
Or, predict the next refugee wave by tracking purchases, money transfers and search terms prior to the last major wave.
Or connect the locations of recipients of text messages and emails to construct an international network and identify people vulnerable to making the big move to join their family or spouse abroad. (If the NSA can do it, why not Frontex?)
Or, an even more sinister possibility- identify undocumented migrant clusters with greater accuracy than ever before by comparing identity and location data with government statistics on who is legally registered.

I know what you're thinking: this might illuminate the behavior of rich kids with smart phones, but for the most vulnerable and poor among us, their interaction with technology is likely to be far more limited. That certainly may be the case, but even the least technologically connected amongst us might well use cell phones and send text messages, shop at companies that utilize data mining, or send money to family members using wire transfers. Further, the nature of Big Data makes this increasingly unimportant- the masses of data that are available make guesses about the rest of the population possible, and can identify trends that include you, even if you didn't contribute to the "research."  As Mayer-Schonburger and Cukier point out, big volumes of data may be "messy" or contain lots of red herrings and inaccuracies, but their size tends to make them more accurate than samples regardless.

The Good News for Migration Studies
All this proliferation of data would be a marked departure in the field of migration studies. As Jeff Crisp wrote in his well-known 1999 article, "Who Has Counted the Refugees":
Despite the centrality of statistics to the field of refugee studies, scholars working in this area have been remarkably inattentive to the issue of quantitative data. While all of the standard works on refugees are replete with numbers, few even begin to question the source or accuracy of those statistics. Scholars have generally been content to rely on figures offered by the two leading producers of refugee statistics - UNHCR and the US Committee for Refugees (USCR) - despite the fact that the figures presented by the two organizations very often differ!
Now, more than a decade later, a lack of statistics on migration and ethnicity continue to plague the field of migration and human rights more generally. We still rely on UNHCR estimates, and undocumented migrants are no closer to being counted in official registers than before. The Open Society Foundation has long been pushing for European governments to collect racially disaggregated statistics in order to comply with European law and illuminate inequality. In a recent blog post for Open Society by Costanza Hermanin ("Making 'Big Data' Work for Equality"), she pointed out the irony of the PRISM scandal when European governments can't seem to collect basic data:
As the recent PRISM program scandal in the United States highlighted, corporations and governments can gather information of any kind about us. Your emails, the foods you like, where you travel, and your shopping preferences are all examples of personal data that can be mined for profiling purposes. It’s ironic, then, that when discussing ethnic minorities or people with disabilities in Europe, “no data available” is a common excuse for not doing more to fight discrimination and inequality.
In addition to making it more difficult to fight discrimination, a lack of statistics (or just as bad, incomplete and inaccurate data) can easily be exploited by irresponsible media or right wing politicians. As I pointed out in a recent article, when it came to media coverage of a supposed "influx" of Roma from Bulgaria and Romania in Germany, the thrilling headlines and distorted statistics may have had real-world consequences for citizens of the two countries.

I don't have the space (or will) to elaborate on all the different ways that more accurate quantitative data could impact the study of migration, but I think its no exaggeration to say the impact could be major. In addition to proving evidence of discrimination and taking the air out of anti-immigrant hyperbole, it could mean any number of advances, such as identifying people at risk for human trafficking, determining what is most needed in a refugee camp,  and learning much more than we now know about identity, diasporas and remittances.

Needless to say, much of this information could also be used for less than noble purposes by governments, corporations, and hate groups, so let me now turn to the downsides of big data.

The Bad News and What to Do About It
via Indymedia

If you've followed me for this far, I have no doubt that the thought has crossed your mind that this whole Big Data thing could also be really bad news for migrants. After all, one of the reasons that we are lacking statistics is because so many migrants, stateless persons and refugees are forced to live in the shadows, unable to claim recognition at the state-level in fear of deportation under restrictive immigration laws. If its bad now, how bad will it be when governments have practically unlimited means for tracking people and their movements? (I don't want to get into the many possibilities for unethical behavior here, lest I be charged with giving them ideas.)

Far be it from me to dissuade you from letting the Gattaca-like scenario unfold in your head. I certainly do think that this may be where we are heading. But that is why it is so important that the "good guys"- people interested in studying rather than criminalizing migration, as well as human rights advocates- are ahead of the curve and preparing for what Big Data will mean for us.

I would argue that this unprecedented opportunity to gather real-time information about migration might illuminate all sorts of policy alternatives to detention and deportation. If immigrant-rights advocates get the jump start, we may have the chance to change minds and hearts, even laws, before the data is used in ways that violate human rights.

And to the extent that isn't possible, it will be necessary to prepare. From a legal perspective, it will be crucial to determine in what ways tracking and identifying of migrants is possible so that we will know what sort of threat big data poses to due process, privacy and the right to freedom of movement. We will need to be familiar with techniques used for tracking migrants or preventing family reunification so that can develop strong arguments against them, even at the same time as we wish to access many of the same stats ourselves for study. We will need to know all the legal justifications for data mining, as well as all the possible legal protections to prevent it. (A few good PhDs on the subject would be a good start.) In short, in order to fight the dangers of big data, it will not do to turn a blind eye and hope that governments do the same.

Is big data a double edged sword? Absolutely. But we ignore its potential impact on migrants and migration at our own peril. After all, at the end of the day we are just talking information here. Data is neutral, its all in how its used that makes the difference. If we allow ourselves to overtaken by governments and their contractors in the race to access and shape that information, then we are taking a giant risk that governments will use big data for good- a bet that historical experience suggests is extremely naive.

I'd love to hear your thoughts and criticisms in the comments.