A platform for high-performance distributed tool and library development written in C++. It can be deployed in two different cluster modes: standalone or distributed. API for v0.5.0, released on June 13, 2018.
|
#include <PhysicalOptimizer.h>
Public Member Functions | |
PhysicalOptimizer (std::vector< AbstractPhysicalNodePtr > &sources, PDBLoggerPtr &logger) | |
~PhysicalOptimizer () | |
bool | getNextStagesOptimized (std::vector< pdb::Handle< AbstractJobStage >> &physicalPlanToOutput, std::vector< pdb::Handle< SetIdentifier >> &interGlobalSets, StatisticsPtr &stats, int &jobStageId) |
bool | hasSources () |
bool | hasConsumers (Handle< SetIdentifier > &set) |
AbstractPhysicalNodePtr | getBestNode (StatisticsPtr &ptr) |
Private Attributes | |
std::map< std::string, AbstractPhysicalNodePtr > | sourceNodes |
std::set< std::string > | penalizedSets |
PDBLoggerPtr | logger |
Static Private Attributes | |
static constexpr double | SOURCE_PENALIZE_FACTOR = 1000.00 |
This class takes in as input a graph made out of
This is accomplished by iteratively calling the method getNextStagesOptimized to generate a sequence of JobStages. As an input to the getNextStagesOptimized we have to provide the storage statistics, so it can determine the best starting source.
The statistics should be iteratively updated after the execution of each sequence of JobStages, to reflect the current state.
There are four types of JobStages that can be generated – TupleSetJobStage: a pipeline – AggregationJobStage: This stage performs the aggregation on shuffled data – BroadcastJoinBuildHTJobStage: Builds a hash table for the broadcast join – HashPartitionedJoinBuildHTJobStage: Builds the hash table for the partitioned join
Definition at line 55 of file PhysicalOptimizer.h.
pdb::PhysicalOptimizer::PhysicalOptimizer | ( | std::vector< AbstractPhysicalNodePtr > & | sources, |
PDBLoggerPtr & | logger | ||
) |
The constructor for the PhysicalOptimizer from a TCAP string and a list of computations associated with it
sources | the source nodes of the graph to analyze |
logger | an instance of the PDBLogger |
Definition at line 25 of file PhysicalOptimizer.cc.
pdb::PhysicalOptimizer::~PhysicalOptimizer | ( | ) |
Definition at line 126 of file PhysicalOptimizer.cc.
AbstractPhysicalNodePtr pdb::PhysicalOptimizer::getBestNode | ( | StatisticsPtr & | ptr | ) |
Returns the best source node based on heuristics
Definition at line 99 of file PhysicalOptimizer.cc.
bool pdb::PhysicalOptimizer::getNextStagesOptimized | ( | std::vector< pdb::Handle< AbstractJobStage >> & | physicalPlanToOutput, |
std::vector< pdb::Handle< SetIdentifier >> & | interGlobalSets, | ||
StatisticsPtr & | stats, | ||
int & | jobStageId | ||
) |
Returns a sequence of job stages that, make up a partial physical plan. After the execution we gather the statistics about the newly created sets and use them to generate the next partial plan.
physicalPlanToOutput | a list where we want to put the sequence of job stages |
interGlobalSets | a list of intermediates sets that need to be created |
stats | the statistics about |
jobStageId | the id of the current job stage |
Definition at line 36 of file PhysicalOptimizer.cc.
bool pdb::PhysicalOptimizer::hasConsumers | ( | Handle< SetIdentifier > & | set | ) |
Returns true if the the provided source still has any consumers that we need to process
set | - the set identifier of the source that we want to check if it is being consumed by later stages |
Definition at line 84 of file PhysicalOptimizer.cc.
bool pdb::PhysicalOptimizer::hasSources | ( | ) |
Check if we still have some sources to process
Definition at line 80 of file PhysicalOptimizer.cc.
|
private |
An instance of the PDBLogger
Definition at line 122 of file PhysicalOptimizer.h.
|
private |
Penalized source sets in the form databaseName:setName
Definition at line 117 of file PhysicalOptimizer.h.
|
staticprivate |
This is the factor applied to the cost of the source if penalized
Definition at line 106 of file PhysicalOptimizer.h.
|
private |
Hash map where the key is the name of the source set in the form of "databaseName:setName" and the AbstractTCAPAnalyzerNodePtr associated with it.
Definition at line 112 of file PhysicalOptimizer.h.