ColumnStoreExporter Object

Methods

generateTableStatement

public String generateTableStatement(DataFrame dataFrame)

Returns a DML CREATE TABLE statement without database prefix based on the schema of the submitted DataFrame. The table name is set to “spark_export”.

Parameters:
  • dataFrame – The DataFrame from whom the structure for the generated table statement will be inferred.
public String generateTableStatement(DataFrame dataFrame, String database)

Returns a DML CREATE TABLE statement with database prefix based on the schema of the submitted DataFrame. The table name is set to “spark_export”.

Parameters:
  • dataFrame – The DataFrame from whom the structure for the generated table statement will be inferred.
  • database – The database name used in the generated table statement.

Note

The submitted database name will automatically be parsed into the ColumnStore naming convention, if not already compatible.

public String generateTableStatement(DataFrame dataFrame, String database, String table)

Returns a DML CREATE TABLE statement for database.table based on the schema of the submitted DataFrame.

Parameters:
  • dataFrame – The DataFrame from whom the structure for the generated table statement will be inferred.
  • database – The database name used in the generated table statement.
  • table – The table name used in the generated table statement.

Note

The submitted database and table names will automatically be parsed into the ColumnStore naming convention, if not already compatible.

public String generateTableStatement(DataFrame dataFrame, String database, String table, bool determineTypeLength)

Returns a DML CREATE TABLE statement for database.table based on the schema (and content) of the submitted DataFrame.

Parameters:
  • dataFrame – The DataFrame from whom the structure for the generated table statement will be inferred.
  • database – The database name used in the generated table statement.
  • table – The table name used in the generated table statement.
  • determineTypeLength – If set to true the content of the DataFrame will be analysed to determine the best SQL datatype for each column. Otherwise reasonable default types will be used.

Note

The submitted database and table names will automatically be parsed into the ColumnStore naming convention, if not already compatible.

export

public void export(String database, String table, DataFrame df)

Exports the given DataFrame into an existing ColumnStore database.table using the default Columnstore.xml configuration.

Parameters:
  • database – The target database the DataFrame is exported into.
  • table – The target table the DataFrame is exported into.
  • df – The DataFrame to export.

Note

To guarantee that the DataFrame import into ColumnStore is a single transaction, that is rollbacked in case of error, the DataFrame is first collected at the Spark master and from there written to the ColumnStore system. Therefore, it needs to fit into the memory of the Spark master.

Note

The schema of the DataFrame to export and the ColumnStore table to import have to match. Otherwise, the import will fail.

public void export(String database, String table, DataFrame df, String configuration)

Exports the given DataFrame into an existing ColumnStore database.table using a specific Columnstore.xml configuration.

Parameters:
  • database – The target database the DataFrame is exported into.
  • table – The target table the DataFrame is exported into.
  • df – The DataFrame to export.
  • configuration – Path to the Columnstore.xml configuration to use for the export.

Note

To guarantee that the DataFrame import into ColumnStore is a single transaction, that is rollbacked in case of error, the DataFrame is first collected at the Spark master and from there written to the ColumnStore system. Therefore, it needs to fit into the memory of the Spark master.

Note

The schema of the DataFrame to export and the ColumnStore table to import have to match. Otherwise, the import will fail.

exportFromWorkers

public void exportFromWorkers(String database, String table, RDD rdd)

Exports the given RDD into an existing ColumnStore database.table from the worker nodes using the default Columnstore.xml configuration.

Parameters:
  • database – The target database the RDD is exported into.
  • table – The target table the RDD is exported into.
  • rdd – The RDD to export.

Note

Each partition of the RDD is imported as single transaction into ColumnStore. In case of an error only partitions in which the error occurred are rolled back. Already committed partitions will remain in the database.

Note

The schema of the RDD to export and the ColumnStore table to import have to match. Otherwise, the import will fail.

public void exportFromWorkers(String database, String table, RDD rdd, List<Int> partitions)

Exports the given partitions of the RDD into an existing ColumnStore database.table from the worker nodes using the default Columnstore.xml configuration.

Parameters:
  • database – The target database the RDD is exported into.
  • table – The target table the RDD is exported into.
  • rdd – The RDD to export.
  • partitions – List of partitions identified by their integer to be exported. If an empty List is submitted all partitions are exported.

Note

Each partition of the RDD is imported as single transaction into ColumnStore. In case of an error only partitions in which the error occurred are rolled back. Already committed partitions will remain in the database.

Note

The schema of the RDD to export and the ColumnStore table to import have to match. Otherwise, the import will fail.

public void exportFromWorkers(String database, String table, RDD rdd, List<Int> partitions, String configuration)

Exports the given partitions of the RDD into an existing ColumnStore database.table from the worker nodes using a specific Columnstore.xml configuration.

Parameters:
  • database – The target database the RDD is exported into.
  • table – The target table the RDD is exported into.
  • rdd – The RDD to export.
  • partitions – List of partitions identified by their integer to be exported. If an empty List is submitted all partitions are exported.
  • configuration – Path to the Columnstore.xml configuration to use for the export.

Note

Each partition of the RDD is imported as single transaction into ColumnStore. In case of an error only partitions in which the error occurred are rolled back. Already committed partitions will remain in the database.

Note

The schema of the RDD to export and the ColumnStore table to import have to match. Otherwise, the import will fail.

parseTableColumnNameToCSConvention

public String parseTableColumnNameToCSConvention(String input)

Parses the input String according to the ColumnStore naming convention and returns it.

Parameters:
  • input – The String that is going to be parsed.