A platform for high-performance distributed tool and library development written in C++. It can be deployed in two different cluster modes: standalone or distributed. API for v0.5.0, released on June 13, 2018.
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Macros
pdb::PhysicalOptimizer Class Reference

#include <PhysicalOptimizer.h>

+ Collaboration diagram for pdb::PhysicalOptimizer:

Public Member Functions

 PhysicalOptimizer (std::vector< AbstractPhysicalNodePtr > &sources, PDBLoggerPtr &logger)
 
 ~PhysicalOptimizer ()
 
bool getNextStagesOptimized (std::vector< pdb::Handle< AbstractJobStage >> &physicalPlanToOutput, std::vector< pdb::Handle< SetIdentifier >> &interGlobalSets, StatisticsPtr &stats, int &jobStageId)
 
bool hasSources ()
 
bool hasConsumers (Handle< SetIdentifier > &set)
 
AbstractPhysicalNodePtr getBestNode (StatisticsPtr &ptr)
 

Private Attributes

std::map< std::string,
AbstractPhysicalNodePtr
sourceNodes
 
std::set< std::string > penalizedSets
 
PDBLoggerPtr logger
 

Static Private Attributes

static constexpr double SOURCE_PENALIZE_FACTOR = 1000.00
 

Detailed Description

This class takes in as input a graph made out of

See Also
AbstractPhysicalNode objects and preforms PhysicalOptimization on them.

This is accomplished by iteratively calling the method getNextStagesOptimized to generate a sequence of JobStages. As an input to the getNextStagesOptimized we have to provide the storage statistics, so it can determine the best starting source.

The statistics should be iteratively updated after the execution of each sequence of JobStages, to reflect the current state.

There are four types of JobStages that can be generated – TupleSetJobStage: a pipeline – AggregationJobStage: This stage performs the aggregation on shuffled data – BroadcastJoinBuildHTJobStage: Builds a hash table for the broadcast join – HashPartitionedJoinBuildHTJobStage: Builds the hash table for the partitioned join

Definition at line 55 of file PhysicalOptimizer.h.

Constructor & Destructor Documentation

pdb::PhysicalOptimizer::PhysicalOptimizer ( std::vector< AbstractPhysicalNodePtr > &  sources,
PDBLoggerPtr logger 
)

The constructor for the PhysicalOptimizer from a TCAP string and a list of computations associated with it

Parameters
sourcesthe source nodes of the graph to analyze
loggeran instance of the PDBLogger

Definition at line 25 of file PhysicalOptimizer.cc.

pdb::PhysicalOptimizer::~PhysicalOptimizer ( )

Definition at line 126 of file PhysicalOptimizer.cc.

Member Function Documentation

AbstractPhysicalNodePtr pdb::PhysicalOptimizer::getBestNode ( StatisticsPtr ptr)

Returns the best source node based on heuristics

Returns
the node

Definition at line 99 of file PhysicalOptimizer.cc.

+ Here is the caller graph for this function:

bool pdb::PhysicalOptimizer::getNextStagesOptimized ( std::vector< pdb::Handle< AbstractJobStage >> &  physicalPlanToOutput,
std::vector< pdb::Handle< SetIdentifier >> &  interGlobalSets,
StatisticsPtr stats,
int &  jobStageId 
)

Returns a sequence of job stages that, make up a partial physical plan. After the execution we gather the statistics about the newly created sets and use them to generate the next partial plan.

Parameters
physicalPlanToOutputa list where we want to put the sequence of job stages
interGlobalSetsa list of intermediates sets that need to be created
statsthe statistics about
jobStageIdthe id of the current job stage
Returns
true if we succeeded in creating the partial physical plan.

Definition at line 36 of file PhysicalOptimizer.cc.

+ Here is the call graph for this function:

bool pdb::PhysicalOptimizer::hasConsumers ( Handle< SetIdentifier > &  set)

Returns true if the the provided source still has any consumers that we need to process

Parameters
set- the set identifier of the source that we want to check if it is being consumed by later stages
Returns
true if the the provided source has any consumers, false otherwise

Definition at line 84 of file PhysicalOptimizer.cc.

bool pdb::PhysicalOptimizer::hasSources ( )

Check if we still have some sources to process

Returns
true if we do, false otherwise

Definition at line 80 of file PhysicalOptimizer.cc.

Member Data Documentation

PDBLoggerPtr pdb::PhysicalOptimizer::logger
private

An instance of the PDBLogger

Definition at line 122 of file PhysicalOptimizer.h.

std::set<std::string> pdb::PhysicalOptimizer::penalizedSets
private

Penalized source sets in the form databaseName:setName

Definition at line 117 of file PhysicalOptimizer.h.

constexpr double pdb::PhysicalOptimizer::SOURCE_PENALIZE_FACTOR = 1000.00
staticprivate

This is the factor applied to the cost of the source if penalized

Definition at line 106 of file PhysicalOptimizer.h.

std::map<std::string, AbstractPhysicalNodePtr> pdb::PhysicalOptimizer::sourceNodes
private

Hash map where the key is the name of the source set in the form of "databaseName:setName" and the AbstractTCAPAnalyzerNodePtr associated with it.

Definition at line 112 of file PhysicalOptimizer.h.


The documentation for this class was generated from the following files: