Analyze the social network of movie stars
Social network analysis is a branch of data science that allows the investigation of social structures using networks and graph theory. It can help to reveal patterns in voting preferences, aid the understanding of how ideas spread, and even help to model the spread of diseases.
A social network is made up of a set of nodes (usually people) that have links, or edges between them that describe their relationships. In this article we analyse the social network formed by movie actors. Each actor in this network is represented as a node. Pairs of actors are then joined by an edge if they are known to have appeared in a movie together. This information is taken from the Internet Movie Database IMDb. Our analysis is carried out using the Python programming language and, in particular, the tools available in the NetworkX library.
example:
{
“title”: “Back to the Future”,
“cast”: [“Michael J. Fox”, “Christopher Lloyd”, “Lea Thompson”, “Crispin Glover”, “Thomas F. Wilson”, “Claudia Wells”, “James Tolkan”, “Marc McClure”, “Wendie Jo Sperber”],
“directors”: [“Robert Zemeckis”],
“producers”: [“Bob Gale”, “Neil Canton”],
“companies”: [“Amblin Entertainment”, “Universal Pictures”],
“year”: 1985
}
Steps
- import packages
1 | import json |
- create a movie list
1 | Movies = [] |
The list is a dictionary, so we can visit each parts by the index.
- search the corresponding movies that meet the requirement.
The movies can be found through the requirements like the actors.
1 | def search_films(key, value): |
[26,911,1084,1103,1161,1365,1402,1793,1980,2131,2210,2394,2395,2396,2397,2398,2399,2400,2401,2412,2422,2423,2424,2425,2436,2604,3143,3406,3491,3492,3912,4277,4470,4479,5039,5040,5340,6477,6579,7248,7902,7916,7920,8087,8705,9152,9257,9555,9642,9644,9645,10260,11121,11319,11327,11523,11525,11667,11840,11975,12316,12914,13011,13012,13013,13014,15651,15727,16454,19272,21174,21513,22601,23022,24986,25142,26267,28142,28188,30893,31846,35553,35973,36505,36549,36729,38362,38444,41311,41739,44043,44107,45903,51663,51822,56326,62100,71516,72118,74710,74964,75838,75924,78144,78301,78499,83908,89644,96923,97984,99572,100993,102035,109576]
- graph
① an empty graph is needed to put the social network on
1 | G = nx.Graph() # create a new graph |
② .add_edge() can be used to add one more edge and .add_node can be used to add one more note by NetworkX
1 | G.add_edge('Alice', 'Bob', title='abc') # title is a parameter that can be used to name the edge |
③ example of complete graph:
1 | G.add_edge('Alice', 'Bob', title='abc') |
④ nx.shortest_path(G, source, target)
can be used to show the shortest path between two actors or actresses. .degree(v)
can be used to calculate how many edge a node have, so that the number of relationship of he or she can be known.
- the final graph
1 | G = nx.Graph() |
- calculate the centrality
1 | print(nx.closeness_centrality(G)) |
{‘Robert Downey Jr.’: 1.0, ‘Terrence Howard’: 0.5769230769230769, ‘Jeff Bridges’: 0.5769230769230769, ‘Shaun Toub’: 0.5769230769230769, ‘Gwyneth Paltrow’: 1.0, ‘Don Cheadle’: 0.8333333333333334, ‘Scarlett Johansson’: 0.625, ‘Sam Rockwell’: 0.625, ‘Mickey Rourke’: 0.625, ‘Samuel L. Jackson’: 0.625, ‘Guy Pearce’: 0.6818181818181818, ‘Rebecca Hall’: 0.6818181818181818, ‘Stéphanie Szostak’: 0.6818181818181818, ‘James Badge Dale’: 0.6818181818181818, ‘Jon Favreau’: 0.6818181818181818, ‘Ben Kingsley’: 0.6818181818181818}
{‘Robert Downey Jr.’: 1.0, ‘Terrence Howard’: 0.26666666666666666, ‘Jeff Bridges’: 0.26666666666666666, ‘Shaun Toub’: 0.26666666666666666, ‘Gwyneth Paltrow’: 1.0, ‘Don Cheadle’: 0.8, ‘Scarlett Johansson’: 0.4, ‘Sam Rockwell’: 0.4, ‘Mickey Rourke’: 0.4, ‘Samuel L. Jackson’: 0.4, ‘Guy Pearce’: 0.5333333333333333, ‘Rebecca Hall’: 0.5333333333333333, ‘Stéphanie Szostak’: 0.5333333333333333, ‘James Badge Dale’: 0.5333333333333333, ‘Jon Favreau’: 0.5333333333333333, ‘Ben Kingsley’: 0.5333333333333333}
{‘Robert Downey Jr.’: 0.23333333333333342, ‘Terrence Howard’: 0.0, ‘Jeff Bridges’: 0.0, ‘Shaun Toub’: 0.0, ‘Gwyneth Paltrow’: 0.23333333333333342, ‘Don Cheadle’: 0.0761904761904762, ‘Scarlett Johansson’: 0.0, ‘Sam Rockwell’: 0.0, ‘Mickey Rourke’: 0.0, ‘Samuel L. Jackson’: 0.0, ‘Guy Pearce’: 0.0, ‘Rebecca Hall’: 0.0, ‘Stéphanie Szostak’: 0.0, ‘James Badge Dale’: 0.0, ‘Jon Favreau’: 0.0, ‘Ben Kingsley’: 0.0}
Thoughts
As the Internet is built and the world becomes more associated, people are all connected because of different kinds of events. So the relationship between them may be complex. In order to find the the social net and who is more central in this social net, using networkx is a good way as it can help to built the relationship graph and calculate some necessary value like centrality. We can also use this to find the relationship between other objects, not only people. For example, we can draw different road to get to the destination by using the graph, so that the shortest way can be found. This can be realize by BFS(breath-first search). You can find more details through the link: