#include <QuerySchedulerServer.h>

Inheritance diagram for pdb::QuerySchedulerServer:

Collaboration diagram for pdb::QuerySchedulerServer:

Public Member Functions
	QuerySchedulerServer (PDBLoggerPtr logger, ConfigurationPtr conf, std::shared_ptr< StatisticsDB > statisticsDB, bool pseudoClusterMode=false, double partitionToCoreRatio=0.75)

	QuerySchedulerServer (int port, PDBLoggerPtr logger, ConfigurationPtr conf, std::shared_ptr< StatisticsDB > statisticsDB, bool pseudoClusterMode=false, double partitionToCoreRatio=0.75)

	~QuerySchedulerServer ()

void	registerHandlers (PDBServer &forMe) override

void	collectStats ()

void	cleanup () override

StatisticsPtr	getStats ()

std::string	getNextJobId ()

Public Member Functions inherited from pdb::ServerFunctionality
template<class Functionality >
Functionality &	getFunctionality ()

void	recordServer (PDBServer &recordMe)

PDBWorkerPtr	getWorker ()

PDBLoggerPtr	getLogger ()

Protected Member Functions
void	initialize ()

void	initializeForServerMode ()

void	initializeForPseudoClusterMode ()

void	scheduleStages (std::vector< Handle< AbstractJobStage >> &stagesToSchedule, std::shared_ptr< ShuffleInfo > shuffleInfo)

void	prepareAndScheduleStage (Handle< AbstractJobStage > &stage, unsigned long node, int &counter, PDBBuzzerPtr &callerBuzzer)

template<typename T >
bool	scheduleStage (unsigned long node, Handle< T > &stage, PDBCommunicatorPtr communicator)

PDBCommunicatorPtr	getCommunicatorToNode (int port, std::string &ip)

Handle< TupleSetJobStage >	getStageToSend (unsigned long index, Handle< TupleSetJobStage > &stage)

Handle< AggregationJobStage >	getStageToSend (unsigned long index, Handle< AggregationJobStage > &stage)

Handle < BroadcastJoinBuildHTJobStage >	getStageToSend (unsigned long index, Handle< BroadcastJoinBuildHTJobStage > &stage)

Handle < HashPartitionedJoinBuildHTJobStage >	getStageToSend (unsigned long index, Handle< HashPartitionedJoinBuildHTJobStage > &stage)

void	collectStatsForNode (int node, int &counter, PDBBuzzerPtr &callerBuzzer)

void	updateStats (Handle< SetIdentifier > setToUpdateStats)

pair< bool, basic_string< char > >	executeComputation (Handle< ExecuteComputation > &request, PDBCommunicatorPtr &sendUsingMe)

pair< bool, basic_string< char > >	registerReplica (Handle< RegisterReplica > &request, PDBCommunicatorPtr &sendUsingMe)

void	extractPipelineStages (int &jobStageId, vector< Handle< AbstractJobStage >> &jobStages, vector< Handle< SetIdentifier >> &intermediateSets)

void	createIntermediateSets (DistributedStorageManagerClient &dsmClient, vector< Handle< SetIdentifier >> &intermediateSets)

void	removeIntermediateSets (DistributedStorageManagerClient &dsmClient)

void	removeUnusedIntermediateSets (DistributedStorageManagerClient &dsmClient, vector< Handle< SetIdentifier >> &intermediateSets)

void	requestStatistics (PDBCommunicatorPtr &communicator, bool &success, string &errMsg) const

Protected Attributes
std::vector < StandardResourceInfoPtr > *	standardResources

int	port

std::vector< Handle < SetIdentifier > >	interGlobalSets

PDBLoggerPtr	logger

ConfigurationPtr	conf

std::shared_ptr< StatisticsDB >	statisticsDB

bool	pseudoClusterMode

pthread_mutex_t	connection_mutex

SequenceID	seqId

std::string	jobId

double	partitionToCoreRatio

std::shared_ptr < PhysicalOptimizer >	physicalOptimizerPtr

StatisticsPtr	statsForOptimization

std::shared_ptr< ShuffleInfo >	shuffleInfo

Detailed Description

This class is working on the Manager node to schedule JobStages dynamically from TCAP logical plan So far following JobStages are supported:

– TupleSetJobStage – AggregationJobStage – BroadcastJoinBuildHTJobStage – HashPartitionJoinBuildHTJobStage

Once the QuerySchedulerServer receives a request (A TCAP program) in the form of an

See Also: pdb::ExecuteComputation object it parses and analyzes the TCAP program as a DAG based on a cost model using a greedy algorithm. The goal is to minimize intermediate data. The scheduling is dynamic and lazy, and only one the JobStages scheduled last time were executed, it will schedule later stages, to maximize the information needed. The JobStages will be dispatched to all workers for execution.

Definition at line 62 of file QuerySchedulerServer.h.

Constructor & Destructor Documentation

pdb::QuerySchedulerServer::QuerySchedulerServer	(	PDBLoggerPtr	logger,
		ConfigurationPtr	conf,
		std::shared_ptr< StatisticsDB >	statisticsDB,
		bool	pseudoClusterMode = `false`,
		double	partitionToCoreRatio = `0.75`
	)

Constructor for the case when we assume that the port where we are running the server is 8108

Parameters

logger	an instance of the PDBLogger
conf
pseudoClusterMode	indicator whether we are running in the pseudo cluster mode or not
partitionToCoreRatio	the ratio between the number of partitions on a node and the number of cores

Definition at line 43 of file QuerySchedulerServer.cc.

pdb::QuerySchedulerServer::QuerySchedulerServer	(	int	port,
		PDBLoggerPtr	logger,
		ConfigurationPtr	conf,
		std::shared_ptr< StatisticsDB >	statisticsDB,
		bool	pseudoClusterMode = `false`,
		double	partitionToCoreRatio = `0.75`
	)

Constructor for the case when where we specify the port number of the server

Parameters

port	is the port on which the server that contains this functionality is running
logger	an instance of the PDBLogger
conf
pseudoClusterMode	indicator whether we are running in the pseudo cluster mode or not
partitionToCoreRatio	the ratio between the number of partitions on a node and the number of cores

Definition at line 61 of file QuerySchedulerServer.cc.

pdb::QuerySchedulerServer::~QuerySchedulerServer ( )

Destructor used to destroy the QuerySchedulerServer

Definition at line 39 of file QuerySchedulerServer.cc.

Member Function Documentation

void pdb::QuerySchedulerServer::cleanup ( )

overridevirtual

Cleans up the query scheduler server so it can be used for the next computation

Reimplemented from pdb::ServerFunctionality.

Definition at line 79 of file QuerySchedulerServer.cc.

Here is the caller graph for this function:

void pdb::QuerySchedulerServer::collectStats ( )

Collects the statistics about the sets from each node and updates them [statsForOptimization]

Definition at line 452 of file QuerySchedulerServer.cc.

Here is the call graph for this function:

Here is the caller graph for this function:

void pdb::QuerySchedulerServer::collectStatsForNode	(	int	node,
		int &	counter,
		PDBBuzzerPtr &	callerBuzzer
	)

protected

Collects the stats for one node

Parameters

node	the node we are collecting the stats for
counter	the counter that is increased when we are finishing updating the stats
callerBuzzer	the buzzer that signals that we are finished

Definition at line 491 of file QuerySchedulerServer.cc.

Here is the call graph for this function:

Here is the caller graph for this function:

void pdb::QuerySchedulerServer::createIntermediateSets	(	DistributedStorageManagerClient &	dsmClient,
		vector< Handle< SetIdentifier >> &	intermediateSets
	)

protected

Given a vector of SetIdentifiers this method issues their creation

Parameters

dsmClient	an instance of the DistributedStorageManagerClient that needs to create the sets
intermediateSets	the vector of intermediate sets

Definition at line 803 of file QuerySchedulerServer.cc.

Here is the call graph for this function:

Here is the caller graph for this function:

pair< bool, basic_string< char > > pdb::QuerySchedulerServer::executeComputation	(	Handle< ExecuteComputation > &	request,
		PDBCommunicatorPtr &	sendUsingMe
	)

protected

This method executes a PDB computation given by the ExecuteComputation object, that was sent by a client

Parameters

request	the object that describes the computation
sendUsingMe	an instance of the PDBCommunicator that points to the client

do the physical planning

create intermediate sets

schedule this job stages

Definition at line 633 of file QuerySchedulerServer.cc.

Here is the call graph for this function:

Here is the caller graph for this function:

void pdb::QuerySchedulerServer::extractPipelineStages	(	int &	jobStageId,
		vector< Handle< AbstractJobStage >> &	jobStages,
		vector< Handle< SetIdentifier >> &	intermediateSets
	)

protected

This method finds the best source operator using a heuristic, then uses this operator to extract a sequence of of pipelinable stages.

Parameters

jobStageId	the of last executed job stage
jobStages	a vector where we want to store the sequence of jobStages
intermediateSets	a vector where we want to store the information about the intermediate sets that need to be generated

Definition at line 829 of file QuerySchedulerServer.cc.

Here is the caller graph for this function:

PDBCommunicatorPtr pdb::QuerySchedulerServer::getCommunicatorToNode	(	int	port,
		std::string &	ip
	)

protected

Connects to the node with the provided ip and port and returns an instance of the communicator

See Also: pdb::PDBCommunicator

Parameters

port	the port of the node
ip	the ip address of the node

Returns: returns null_ptr if it fails, the communicator to the node if it succeeds

Definition at line 272 of file QuerySchedulerServer.cc.

Here is the caller graph for this function:

std::string pdb::QuerySchedulerServer::getNextJobId ( )

inline

Returns the id of the job we are about to run Has the following format Job_Year_Month_Day_Hour-Minute_Second_{Next number from the seqId sequence generator}

Returns: the id of the next job, a string

Definition at line 127 of file QuerySchedulerServer.h.

Here is the call graph for this function:

Here is the caller graph for this function:

Handle< TupleSetJobStage > pdb::QuerySchedulerServer::getStageToSend	(	unsigned long	index,
		Handle< TupleSetJobStage > &	stage
	)

protected

Makes a deep copy of the TupleSetJobStage, fills in additional information about the node we are sending it and returns it

Parameters

stage an instance of the TupleSetJobStage

Returns: the copy with additional information

Definition at line 342 of file QuerySchedulerServer.cc.

Here is the caller graph for this function:

Handle< AggregationJobStage > pdb::QuerySchedulerServer::getStageToSend	(	unsigned long	index,
		Handle< AggregationJobStage > &	stage
	)

protected

Makes a deep copy of the AggregationJobStage, fills in additional information about the node we are sending it and returns it

Parameters

stage an instance of the AggregationJobStage

Returns: the copy with additional information

Definition at line 387 of file QuerySchedulerServer.cc.

Handle< BroadcastJoinBuildHTJobStage > pdb::QuerySchedulerServer::getStageToSend	(	unsigned long	index,
		Handle< BroadcastJoinBuildHTJobStage > &	stage
	)

protected

Makes a deep copy of the stage provided and detaches it from the logical plan, by setting it to null and fill in additional information about the node we are sending it to

Parameters

stage an instance of the BroadcastJoinBuildHTJobStage

Returns: the copy with additional information

Definition at line 411 of file QuerySchedulerServer.cc.

Handle< HashPartitionedJoinBuildHTJobStage > pdb::QuerySchedulerServer::getStageToSend	(	unsigned long	index,
		Handle< HashPartitionedJoinBuildHTJobStage > &	stage
	)

protected

Makes a deep copy of the stage provided and detaches it from the logical plan, by setting it to null and fill in additional information about the node we are sending it to

Parameters

stage an instance of the HashPartitionedJoinBuildHTJobStage

Returns: the copy with additional information

Definition at line 427 of file QuerySchedulerServer.cc.

StatisticsPtr pdb::QuerySchedulerServer::getStats ( )

Returns the statistics that are being used for optimization

Returns: the statistics

Definition at line 158 of file QuerySchedulerServer.cc.

void pdb::QuerySchedulerServer::initialize ( )

protected

Initializes the QuerySchedulerServer more specifically the

See Also: pdb::QuerySchedulerServer::standardResources by fetching the necessary information from the resource manager

Definition at line 92 of file QuerySchedulerServer.cc.

Here is the call graph for this function:

Here is the caller graph for this function:

void pdb::QuerySchedulerServer::initializeForPseudoClusterMode ( )

protected

TODO Ask Jia what the difference is between a resource and a node add a proper description

Definition at line 107 of file QuerySchedulerServer.cc.

Here is the caller graph for this function:

void pdb::QuerySchedulerServer::initializeForServerMode ( )

protected

TODO Ask Jia what the difference is between a resource and a node add a proper description

Definition at line 131 of file QuerySchedulerServer.cc.

Here is the caller graph for this function:

void pdb::QuerySchedulerServer::prepareAndScheduleStage	(	Handle< AbstractJobStage > &	stage,
		unsigned long	node,
		int &	counter,
		PDBBuzzerPtr &	callerBuzzer
	)

protected

This method takes in an

See Also: pdb::AbstractJobStage infers its subtype, opens up a communicator to the specified node and schedules the stage at it.

Parameters

stage	the stage we want to send
node	the node we want to send the stage to
counter	a reference to the counter that needs to be increased once the execution of the stage is complete
callerBuzzer	the buzzer we use to notify the calling thread that we are done executing

Definition at line 200 of file QuerySchedulerServer.cc.

Here is the call graph for this function:

Here is the caller graph for this function:

void pdb::QuerySchedulerServer::registerHandlers ( PDBServer & forMe )

overridevirtual

This method is from the serverFunctionality interface... It register the query scheduler handlers to the provided server

Implements pdb::ServerFunctionality.

Definition at line 573 of file QuerySchedulerServer.cc.

Here is the call graph for this function:

pair< bool, basic_string< char > > pdb::QuerySchedulerServer::registerReplica	(	Handle< RegisterReplica > &	request,
		PDBCommunicatorPtr &	sendUsingMe
	)

protected

This method registers a replica with statisticsDB per client request

Parameters

request	the object that describes the computation
sendUsingMe	an instance of the PDBCommunicator that points to the client

Definition at line 595 of file QuerySchedulerServer.cc.

Here is the caller graph for this function:

void pdb::QuerySchedulerServer::removeIntermediateSets ( DistributedStorageManagerClient & dsmClient )

protected

This method removes all the intermediate sets that we needed to continue our execution They are kept in the interGlobalSets vector

Parameters

dsmClient an instance of the DistributedStorageManagerClient that needs to remove the sets

Definition at line 779 of file QuerySchedulerServer.cc.

Here is the call graph for this function:

Here is the caller graph for this function:

void pdb::QuerySchedulerServer::removeUnusedIntermediateSets	(	DistributedStorageManagerClient &	dsmClient,
		vector< Handle< SetIdentifier >> &	intermediateSets
	)

protected

Given a vector of SetIdentifiers this method issues their removal Sets that are going to be used later in the execution are not removed, they are cleaned up with the method

See Also: QuerySchedulerServer::removeIntermediateSets

Parameters

dsmClient	dsmClient an instance of the DistributedStorageManagerClient that needs to remove the sets
intermediateSets	the vector of intermediate sets

Definition at line 746 of file QuerySchedulerServer.cc.

Here is the call graph for this function:

Here is the caller graph for this function:

void pdb::QuerySchedulerServer::requestStatistics	(	PDBCommunicatorPtr &	communicator,
		bool &	success,
		string &	errMsg
	)		const

protected

This method takes in a communicator to a node and issues a request for statistics about the stored sets

Parameters

communicator	the communicator to the node
success	a reference a boolean that will set to true if the request succeeds, false otherwise
errMsg	the error message that is gonna be set, if an error occurs

Definition at line 550 of file QuerySchedulerServer.cc.

Here is the caller graph for this function:

template<typename T >

bool pdb::QuerySchedulerServer::scheduleStage	(	unsigned long	node,
		Handle< T > &	stage,
		PDBCommunicatorPtr	communicator
	)

protected

This method schedules a pipeline stage given the index of a specified node and a communicator to that node.

Parameters

node	is the index of the node we want to schedule the stage
stage	is the pipeline stage we want to schedule
communicator	is the communicator to a node we are going to use for that

Definition at line 302 of file QuerySchedulerServer.cc.

Here is the call graph for this function:

Here is the caller graph for this function:

void pdb::QuerySchedulerServer::scheduleStages	(	std::vector< Handle< AbstractJobStage >> &	stagesToSchedule,
		std::shared_ptr< ShuffleInfo >	shuffleInfo
	)

protected

This method is used to schedule dynamic pipeline stages It must be invoked after initialize() and before cleanup()

Parameters

stagesToSchedule	is a vector of all the stages we want to schedule
shuffleInfo	is the shuffle information for job stages that needs repartitioning

Definition at line 163 of file QuerySchedulerServer.cc.

Here is the call graph for this function:

Here is the caller graph for this function:

void pdb::QuerySchedulerServer::updateStats ( Handle< SetIdentifier > setToUpdateStats )

protected

Updates the optimization stats for a given set

Parameters

setToUpdateStats the set we are updating (contains also info about the pages and size)

Definition at line 555 of file QuerySchedulerServer.cc.

Here is the caller graph for this function:

Member Data Documentation

ConfigurationPtr pdb::QuerySchedulerServer::conf

protected

The configuration of the node provided by the constructor

Definition at line 339 of file QuerySchedulerServer.h.

pthread_mutex_t pdb::QuerySchedulerServer::connection_mutex

protected

This is used to synchronize the communicator More specifically the part where we are creating them and connecting them to a remote node with the connectToInternetServer method

Definition at line 356 of file QuerySchedulerServer.h.

std::vector<Handle<SetIdentifier> > pdb::QuerySchedulerServer::interGlobalSets

protected

Set identifiers for shuffle set, we need to create and remove them at scheduler, so that they exist at any node when any other node needs to write to it

Definition at line 329 of file QuerySchedulerServer.h.

std::string pdb::QuerySchedulerServer::jobId

protected

The id of the current job. Used to identify the job and the database for it's results

Definition at line 367 of file QuerySchedulerServer.h.

PDBLoggerPtr pdb::QuerySchedulerServer::logger

protected

An instance of the PDBLogger set in the constructor

Definition at line 334 of file QuerySchedulerServer.h.

double pdb::QuerySchedulerServer::partitionToCoreRatio

protected

Used to calculate the the number of partitions on a node, based on the number cpu cores on that node more specifically partitionToCoreRatio = numPartitionsOnThisNode/numCores

Definition at line 373 of file QuerySchedulerServer.h.

std::shared_ptr<PhysicalOptimizer> pdb::QuerySchedulerServer::physicalOptimizerPtr

protected

An instance of the PhyscialAnalyzer. We use it to do the dynamic planning. More specifically to find the source that is we should start from using heuristics and to generate pipeline stages starting from this source

Definition at line 380 of file QuerySchedulerServer.h.

int pdb::QuerySchedulerServer::port

protected

The port through which we access the functionalities on this node (port the PDBServer listens to)

Definition at line 323 of file QuerySchedulerServer.h.

bool pdb::QuerySchedulerServer::pseudoClusterMode

protected

True if we are running PDB in pseudo cluster mode false otherwise

Definition at line 349 of file QuerySchedulerServer.h.

SequenceID pdb::QuerySchedulerServer::seqId

protected

Used to generate a unique sequential ID getNextSequenceID is thread safe to call

Definition at line 362 of file QuerySchedulerServer.h.

std::shared_ptr<ShuffleInfo> pdb::QuerySchedulerServer::shuffleInfo

protected

Wraps shuffle information for job stages that needs repartitioning data

Definition at line 394 of file QuerySchedulerServer.h.

std::vector<StandardResourceInfoPtr>* pdb::QuerySchedulerServer::standardResources

protected

A vector containing the information about the resources of each node

Definition at line 318 of file QuerySchedulerServer.h.

std::shared_ptr<StatisticsDB> pdb::QuerySchedulerServer::statisticsDB

protected

A pointer to StatisticsDB that manages various statistics

Definition at line 344 of file QuerySchedulerServer.h.

StatisticsPtr pdb::QuerySchedulerServer::statsForOptimization

protected

Contains the information about every set on every node, more specifically :

the number of pages
the size of the pages
and the total number of bytes the set has

Definition at line 389 of file QuerySchedulerServer.h.

The documentation for this class was generated from the following files:

plinycompute/pdb/src/serverFunctionalities/headers/QuerySchedulerServer.h
plinycompute/pdb/src/serverFunctionalities/source/QuerySchedulerServer.cc

Public Member Functions

Protected Member Functions

Protected Attributes

Detailed Description

Constructor & Destructor Documentation

Member Function Documentation

Member Data Documentation