Apache spark graph processing : build, process, and analyze large-scale graphs with Spark / Rindra Ramamonjison ; foreword by Denny Lee.Material type: TextSeries: Community experience distilledPublisher: Birmingham, UK : Packt Publishing, Copyright date: ©2015Description: 1 online resource (1 volume) : illustrationsContent type: text Media type: computer Carrier type: online resourceISBN: 9781784398958; 1784398950Other title: Build, process, and analyze large-scale graphs with SparkSubject(s): Spark (Electronic resource : Apache Software Foundation) | Spark (Electronic resource : Apache Software Foundation) | Graphic methods -- Computer programs | Electronic data processing | COMPUTERS -- General | Electronic data processing | Graphic methods -- Computer programsGenre/Form: Electronic books. Additional physical formats: Print version:: Apache Spark Graph Processing.DDC classification: 006.3/12 LOC classification: QA76.9.D343Online resources: Click here to access online
|Item type||Current library||Collection||Call number||Status||Date due||Barcode||Item holds|
Includes bibliographical references and index.
Online resource; title from cover page (Safari, viewed September 28, 2015).
Cover; Copyright; Credits; Foreword; About the Author; About the Reviewer; www.PacktPub.com; Table of Contents; Preface; Chapter 1: Getting Started with Spark and GraphX; Downloading and installing Spark 1.4.1; Experimenting with the Spark shell; Getting started with GraphX; Building a tiny social network; Loading the data; The property graph; Transforming RDDs to VertexRDD and EdgeRDD; Introducing graph operations; Building and submitting a standalone application; Writing and configuring a Spark program; Building the program with the Scala Build Tool; Deploying and running with spark-submit.
The graph visualizationInstalling the GraphStream and BreezeViz libraries; Visualizing the graph data; Plotting the degree distribution; The analysis of network connectedness; Finding the connected components; Counting triangles and computing clustering coefficients; The network centrality and PageRank; How PageRank works; Ranking web pages; Scala Build Tool revisited; Organizing build definitions; Managing library dependencies; A preview of the steps; Running tasks with SBT commands; Summary; Chapter 4: Transforming and Shaping Up Graphs to Your Needs.
Transforming the vertex and edge attributesmapVertices; mapEdges; mapTriplets; Modifying graph structures; The reverse operator; The subgraph operator; The mask operator; The groupEdges operator; Joining graph datasets; joinVertices; outerJoinVertices; Example -- Hollywood movie graph; Data operations on VertexRDD and EdgeRDD; Mapping VertexRDD and EdgeRDD; Filtering VertexRDDs; Joining VertexRDDs; Joining EdgeRDDs; Reversing edge directions; Collecting neighboring information; Example -- from food network to flavor pairing; Summary; Chapter 5: Creating Custom Graph Aggregation Operators.
NCAA College Basketball datasetsThe aggregateMessages operator; EdgeContext; Abstracting out the aggregation; Keeping things DRY; Coach wants more numbers; Calculating average points per game; Defense stats -- D matters as in direction; Joining average stats into a graph; Performance optimization; The MapReduceTriplets operator; Summary; Chapter 6: Iterative Graph-Parallel Processing with Pregel; The Pregel computational model; Example -- iterating towards the social equality; The Pregel API in GraphX; Community detection through label propagation; The Pregel implementation of PageRank; Summary.
Chapter 2: Building and Exploring Graphs; Network datasets; The communication network; Flavor networks; Social ego networks; Graph builders; The Graph factory method; edgeListFile; fromEdges; fromEdgeTuples; Building graphs; Building directed graphs; Building a bipartite graph; Building a weighted social ego network; Computing the degrees of the network nodes; In-degree and out-degree of the Enron email network; Degrees in the bipartite food network; Degree histogram of the social ego networks; Summary; Chapter 3: Graph Analysis and Visualization; Network datasets.