I hadn’t intended to write a series of posts on the intersection of social media and online identities, yet somehow, here we are with a third post. In a previous comment on my first post Ferns raised the issue of transparency and how companies hide the ‘how it works’ aspect. That’s a fascinating topic in itself, and so I wanted to circle back on it.
It’s clearly true that companies should do a better job of notifying users of what data they’re collecting. They don’t want to do that because there are only negative consequences for them. No user is going to say “I love that this huge faceless corporation knows all this stuff about me, but you’re missing out on a lot more private stuff I haven’t shared. Let me help you access that as well.” In reality, given more visibility of the data gathering process, users are only going to want to add constraints, which in turn hurts the companies product and their advertising revenue.
When it comes to the interpretation of the data – for example, why Facebook makes the friend suggestions that it does – then the story is more complicated. Machine learning and particularly deep learning is driving a lot innovation in big tech companies these days. Traditionally a software developer would analyze a problem and code up an algorithm to solve it. Now that same developer will specify the end result they want (these people are friends, these people are not friends), gather as much input data as they can (user location, hometown, school, posts they liked, etc.) and try and train a system to figure out the end result from the inputs. Typically this involves throwing a huge amount of computational power at the problem (which is why this has only become practical recently) and results in a black box that nobody really understands. Given the right inputs (e.g. data about users) this black box might be able to make excellent predictions about who is friends with who, but it can be difficult to say exactly why it makes any single prediction. So when companies say it’s difficult to share why certain suggestions were made, they might not be lying. They might not know themselves.
As an example of this, let’s consider the original case of the sex worker I talked about in my first post. I should be clear I know nothing about this beyond the public articles and I know nothing about Facebook’s internal algorithms or what data they have. This is speculation designed purely to illustrate the issue. That said, imagine if Facebook had access to the WiFi networks people accessed from their phones over time. Being on the same public network as someone else doesn’t mean much. Even repeatedly seeing the same networks at roughly the same time doesn’t mean much. Maybe you just happen to regularly go coffee at the same time and place as some other random person. But repeatedly being on the same networks at the same time, but in different places over many months would be indicative of a possible relationship. That’s the kind of correlation that a machine learning system could figure out. It’s also the kind of correlation that would occur for a sex worker regularly meeting the same group of clients at different hotels in a city.
Apologies if anyone visited here with the crazy idea of reading posts about femdom. Hopefully I’ll get back on that track in the next day or two. In the meantime I’ll continue my theme of old school anonymity via masquerade style masks. This is the lovely Anne Hathaway, the one bright spot in the otherwise terrible Dark Knight Rises.
It’s true that deep learning is driving a lot of Big Data activity. However, it is aimed at creating models that will provide individuals with recommendations that are far from obvious. It’s also used to identify individuals with specific characteristics (classification) and also prediction of all sorts of stuff.
The key, at least for me, is that the output of all this deep learning are models (programs) that do something with information about an individual who is currently present. I don’t think it is used to concentrate data about individuals. There are a lot more cheaper ways to do that and sadly, marketing companies have been doing that for over a decade.
Companies, like Facebook and Google, tend to go too far. I don’t think it is crass commercialism as much as it is enthusiasm my the young technologists who think that because they can do/learn something, it makes sense to do it. Hopefully, they will mature before doing too much real damage.
I think it’s tough to know exactly what areas deep learning is going to have a significant impact on. It’s already being used for things like creating spam, detecting spam, facial recognition, improving search results, speech, handwriting, recommendations, personal assistants, game cheating detection, credit card fraud, etc. Pretty much anything you can turn into a bunch of feature vectors and create a training set for is a potential candidate. It’ll be interesting to see if it ends up being a minor shift in technology or one of those seismic shifts that happen every decade or two.
As for the big companies, I’m possibly more cynical than you. I agree that young technologists will do things just because they can, but what they do ends up being filtered and expressed by the crass commercialism that is the corporate/capitalist culture. There is an endless supply of young technologists, and I don’t see them slowing down or the corporate cultures changing. But as I said, I might be too cynical.
Interesting topic and comment. Thanks for taking the time to share your thoughts.
-paltego
It’s not always about deep learning. One specific example is Whatsapp, the messaging service popular in Europe [I don’t know about the USA]. It was bought a few years ago by Facebook for about USD 19 bn. Initially the social media giant said it would respect users’ privacy, but hey if you pay that much money at some point you have to monetize your investment, which duly happened.
Imagine a person that doesn’t have Facebook but uses Whatsapp. Obviously the messaging service needs to have access to his or her contacts. What Facebook did however do, is to compare the contacts in an individual user’s phone. For each two contacts that use Facebook, when the probability is high enough, they are sent a “do you know her or him…” message.
This is something that actually happened over here in Europe. While it does not explain what happened to the lady in the original article, it shows one way how Facebook connects people against their wishes.
===
Before the internet there was this thing called six degrees of separation. Researchers gave test subjects letters addressed to people they didn’t know. Test subjects were asked to forward these letters to the person they thought was most likely to help them get the letter to the person it was addressed to. On average it took 6 people for a letter to reach its intended recipient, anywhere in the world.
Nowadays we leave so many digital traces it is almost impossible to hide a secret identity. For the time being, two phones – one vanilla and one kink – might work but what if Facebook buys Twitter and starts comparing IP addresses? I’m sure there are many much more unsettling examples out there.