This application component let's you easily import data into your application from various sources.
- Installation
- Supported DBMS
- Using the application component
- Import wizard
- Supported file types
- Import Configuration
- Import Execution
- Default binding behavior
- Import Limitations
data-import
is available in the CUBA marketplace- Select a version of the add-on which is compatible with the platform version used in your project:
Platform Version | Add-on Version |
---|---|
7.2.x | 0.12.x - 0.14.x |
7.1.x | 0.10.x - 0.11.x |
7.0.x | 0.8.x - 0.9.x |
6.10.x | 0.7.x |
6.9.x | 0.5.x - 0.6.x |
6.8.x | 0.1.x - 0.4.x |
Add custom application component to your project:
- Artifact group:
de.diedavids.cuba.dataimport
- Artifact name:
dataimport-global
- Version: add-on version
dependencies {
appComponent("de.diedavids.cuba.dataimport:dataimport-global:*addon-version*")
}
Information on changes that happen through the different versions of the application component can be found in the CHANGELOG. The Changelog also contains information about breaking changes and tips on how to resolve them.
The following databases are supported by this application component:
- HSQLDB
- PostgreSQL
- MySQL
All other DBMS systems are also possible to work with by the fact that CUBA studio generates the corresponding init / update scripts within the application.
The data-import
application component helps you import data into your application from different files.
Currently the following file-types are supported:
- Excel
.xlsx
- comma separated values
.csv
- JSON
.json
- XML
.xml
In order to configure various import options, there is a UI based configuration possibility to define
- which entity will be imported
- which columns maps to which entity attribute
- format configuration for dates, boolean values etc.
- unique configurations and how to deal with these situations
- custom groovy scripts for attributes to value mapping
There are two modes of using the data-import
application component. The first one is an interactive UI wizard, which
will guide the user directly through the process of importing the data.
The second mode is, that the import configuration can be pre-defined by a developer / administrator of the system. The end-user of the system can reuse this configurations and just uploads the file that should get imported.
To see this application component in action, check out this example: cuba-example-using-data-import.
The import wizard allows the user to interactively go through the import process and configure the above mentioned settings
for the import execution. It can be found in the main menu: Administration > Data Import > Import Wizard
The second step in the wizard allows the user to configure which columns of the import file will be mapped to which entity attributes. The system makes suggestions based on the similarities of the entity attribute names and the column headers in the import file, but this can be adjusted by the user if needed.
In the import configuration it is possible to define certain Format options as well as the unique configurations for this import.
The last step will preview the data that was received from the import file. With "Start Import" the import process will be triggered. Afterwards the user will see a summary of how many entities were imported correctly.
In the core module, there is an API available for programmatic interacting with the data import facilities.
DataImportAPI
takes a FileDescriptor
together with a ImportConfiguration
and imports the content of the file according
to the given configuration.
An example usage can be found in the MlbTeamImportService in the example application.
Instead of using the full import wizard, it is also possible to integrate the import process directly into screens. This use case is for the second usage mode of this application component (as mentioned above). In this case, there is already an import configuration defined for a particular entity. With that, you as the developer want the user directly to start the import process from the browse screen of your entities.
NOTE: The @WithImport
annotation should only be used for CUBA 6 based legacy screens that extend AbstractLookup
. For
CUBA 7 based Screens extenting StandardLookup<T>
should use the interface based approch implement WithImportWizard
.
To start import from your entity browse screen, you have to add the following annotation to your browse screen controller:
@WithImport(listComponent = "customersTable")
public class CustomerBrowse extends AnnotatableAbstractLookup {
}
For the @WithImport
annotation you need to define the list component on which it should add the attachments button.
Normally this is the id
of the table you defined in your browse screen.
This annotation will create a button in the buttonsPanel of the table and add the Import button after the default CUBA buttons.
The @WithImport
annotations can be customized through the following attributes:
String listComponent
- the id of the list component / table where the button will be added - REQUIREDString buttonId
- the id of the newly created button that will be created ("importBtn" by default)String buttonsPanel
- the id of the buttons panel where the new button will be added ("buttonsPanel" by default)
When the import button is clicked on the CustomerBrowse
, it will check if there are import configuration available
for this Entity. In case there are multiple configurations available for this entity, the user has to select a particular
import configuration to proceed.
To start import from your entity browse screen, the screen controller has to implement the following interface:
public class CustomerBrowse extends StandardLookup<Customer> implements WithImportWizard {
}
The WithImportWizard
interface is a replacement for the previous existing @WithImport
annotation.
It will create a button in the buttonsPanel of the table and add the Import button after the default CUBA buttons.
WithImportWizard
requires to implement certain methods in order to configure the way the import wizard works:
public class CustomerBrowse extends StandardLookup<Customer> implements WithImportWizard {
@Inject
protected GroupTable<Customer> customerTable;
@Inject
protected CollectionContainer<Customer> customerDc;
@Inject
protected ButtonsPanel buttonsPanel;
@Override
public ListComponent getListComponent() {
return customerTable;
}
@Override
public CollectionContainer getCollectionContainer() {
return customerDc;
}
@Override
public ButtonsPanel getButtonsPanel() {
return buttonsPanel;
}
}
Furthermore it has the following optional methods to implement to configure the behavior of the import wizard further:
Map<String, Object> getDefaultValues()
- defines default values for the entity that will be importedString getButtonId
- the button id of the destination button. Will picked up from existing XML or created with this identifier
When the import button is clicked on the CustomerBrowse
, it will check if there are import configuration available
for this Entity. In case there are multiple configurations available for this entity, the user has to select a particular
import configuration to proceed.
Multiple file types are supported by this application component. Information and requirements for certain file types will be described below.
Example files can be found in the example-data subdirectory.
For Excel files the first row has to be the column names. Unnamed columns are not supported currently.
Example Excel file:
Name | Description |
---|---|
Users | This will be the users |
Managers | The moderators |
For CSV files the first row has to be the column names. Unnamed columns are not supported currently.
Example CSV file:
"Name","Description"
"Users", "This will be the users"
"Moderators", "The Moderators"
For JSON files it is required to be a JSON array, where each entry in this array is itself a JSON object, which should get imported as an entity instance.
Example JSON file:
[
{
"Name": "Users",
"Description": "The users of the system"
},
{
"Name": "Moderators",
"Description": "The mods of the system"
}
]
It is also possible to have nested structures in the JSON and bind it to a entity attribute. In order to do this, a Custom attribute binding script has to be configured for the desired entity attribute.
An example JSON file for this would be:
[
{
"Name": "Mark",
"Lastname": "Andersson",
"Address": {
"street": "Dorfkrug 1",
"postcode": "51665",
"city": "Bad Neuendorf"
},
"orders": [
{
"orderId": 1
},
{
"orderId": 2
}
]
}
]
In the custom binding script, access to the nested structure can be achieved like this:
return rawValue.Address.street
Or in case of the orders
Array it would be:
return rawValue.orders[0].orderId
For XML files it is required to be a List of XML elements directly under the root XML element which should get imported as an entity instance.
Example XML file:
<roles>
<role>
<Name>Users</Name>
<Description>The users of the system</Description>
</role>
<role>
<Name>Moderators</Name>
<Description>The mods of the system</Description>
</role>
</roles>
It is also possible to have nested structures in the XML and bind it to a entity attribute. In order to do this, a Custom attribute binding script has to be configured for the desired entity attribute.
An example XML file for this would be:
<root>
<entry>
<Name>Users</Name>
<Description>The users of the system</Description>
<permission>
<code>ALLOW_EVERYTHING</code>
<name>Allow everything</name>
</permission>
</entry>
<entry>
<Name>Moderators</Name>
<Description>The mods of the system</Description>
<permission>
<code>DENY_ALL</code>
<name>Nothing is allowed</name>
</permission>
</entry>
</root>
In the custom binding script, access to the nested structure can be achieved like this:
return rawValue.permission.code
The basis for the import wizard is the Import Configuration
. It is also available via Administration > Data Import > Import Configuration
.
The Import Configuration
contains all configuration options that are available for a single import process.
Generally the configurations can be saved for later reuse. This is possible within the Import wizard. Alternatively the user can create an import configuration beforehand via the corresponding list.
The base information that are required for an Import configuration are name, entity type as well as an import file where the attributes can be parsed from.
An entity attribute mapping defines which column / attribute in the import file should be mapped to a particular attribute of the destination entity.
An attribute mapping contains the following information:
- column name in the import file
- column number (only relevant for CSV / Excel)
- entity attribute
- column required flag
When column required flag is set to true, addon validates that the column name exists in the imported file. It doesn't check their content, so you still need to validate that additionally.
When creating an import configuration (directly or via the import wizard), the application component will try to parse the import file and depending on the column names / attribute names, it will try to suggest the most appropriate entity attribute that is available. Since this auto-detection feature has limitations, is it suggested to before executing the import validate that the suggested entity attributes for the mappings are correct.
Additionally it is possible to configure a custom binding script, that let's the user implement certain parsing logic / default values in case this is not handled by the default binding behavior.
The return value of this script will be set to the corresponding Entity attribute.
If this script is set, it will disable the auto-detection of the import process.
The return value of the script has to be of the correct value and of the correct type that is defined in the corresponding entity attribute
Within the custom binding script, the following variables are injected and available for usage:
rawValue
: the raw value that should be imported from the import filedataManager
: a reference to the DataManager from CUBAdataRow
: the complete data row as it is taken from the import fileentityAttribute
: the current entity attributeimportConfiguration
: the current import configurationimportAttributeMapper
: the current import attribute mapper entity
Unique configurations in the Import Configuration allow the user to define certain business unique scenarios. Sometimes during the import process it is necessary to define what happens if an entity instance with particular attribute values is already in the database.
Possible results in case of a unique-violation might be to skip this entity instance or to update the existing entity instance that was found in the database.
Within an import configuration it is possible to define multiple unique configurations. for each unique configuration it is possible to define multiple entity attributes which should be taken into consideration.
for every data row the import process will check all unique configurations. if any of those configurations
find an entity that match the criteria of the data row, the corresponding unique configuration policy (UniquePolicy
) will be executed:
Skip if exists
- the data row will not get imported and skippedUpdate existing entity
- the values of the data row will update the found entity instanceAbort import
- the import process will be aborted immediately. Depending on the transaction strategy either the entities up until this point will be written (transaction per entity), or no entity at all (single transaction).
The MlbTeam
entity has the following data in the database
Code | Name | State |
---|---|---|
BAL | "Baltimore Orioles" | MD |
Now we want to import the following CSV file:
"Code","Name","State"
BAL,"Baltimore New Team",MD
When there is the following unique configuration:
- unique configuration attributes:
code
- unique policy:
Skip if exists
The result in the database:
Code | Name | State |
---|---|---|
BAL | "Baltimore Orioles" | MD |
"Code","Name","State"
BAL,"Baltimore New Team",MD
BAL,"Baltimore Orioles",CA
When there is the following unique configuration:
- unique configuration attributes:
code
,name
- unique policy:
Update existing entity
The result in the database:
Code | Name | State |
---|---|---|
BAL | "Baltimore Orioles" | CA |
The transaction strategy is an option that can be configured within the import configuration. it defines how the system should behave in case one of the entries cannot be stored.
the following options are available:
All entities will be imported in one transaction. if an error occurs in any of the entities, no entity will be imported
Every entity will be imported in an own transaction. if an error occurs in any of the entities, all other entities will be imported
The Pre-Commit script is a groovy script, which can be defined in the ImportConfiguration. It will be executed directly before the already bound entity instance will get imported.
Within this script, the entity can be adjusted, values can be re-written, default values can be set etc.
The following variables are injected and available for usage:
entity
: the already bound, but not persisted entity instancedataRow
: the complete data row as it is taken from the import filedataManager
: a reference to the DataManager from CUBAimportConfiguration
: the current import configuration
It is also possible to prevent the import for this entity instance.
To do this, the script has to return a boolean value, which represents if the entity should get imported or not.
true
will import the entity instancefalse
will not import the entity instance
Example of veto right script:
if (entity.name.startsWith("B") {
entity.name = entity.name + " - with a B"
return true
}
else {
return false
}
NOTE: If there is no explicit return value in the script, groovy will return the return value of the last expression. That might not lead to the expected result. Be aware of that.
Example of an implicit (possibly wrong) veto right script:
entity.name = entity.name + " " + entity.code
This example will return null
, because the MlbPlayer.setName()
returns a void
return value, which will be evaluated
as false in groovy. Therefore this entity will not be imported.
NOTE: always use explicit return statements in the pre-commit script
When executing the data import the results of the operation will be logged as Import execution. Those import executions can be found
in the Menu: Administration > Data Import > Import Executions
. The Import Execution contains information about the import process for
a given file. For each failing import row, it contains detailed information the following information:
- Category
- Level
- failing data row
- the entity instance after all attributes are bound
- error message
- stacktrace
During the import process the values of the import file have to be bound to the entity attributes. By default the following attribute types are supported in the default binding:
- String
- Integer
- Double
- BigDecimal
- Boolean
- Date (java.util.Date)
For boolean values it is possible to configure within the Import Configuration which values in the import file
represent the true
value and which represent the false
value. So it is e.g. possible to configure "Yes" / "No"
as the values which will be treated as true / false while binding the value.
For Date values, the format can be configured within the Import Configuration as well. It uses the SimpleDateFormat formats for parsing. Examples:
dd.MM.yyyy HH:mm:ss z
would able to parse values like07.04.2001 12:08:56 PDT
yy/dd/MM
would able to parse values like13/04/07
Enum binding is supported automatically. In order to bind a value from the import file to an Enum value, the value has to match the value of an Enum as it is defined. Example:
public enum CustomerPriority {
LOW,
MEDIUM,
HIGH;
}
The following binding values would lead to the result:
Value from Import file | binding result |
---|---|
"HIGH" |
CustomerPriority.HIGH |
"high" |
CustomerPriority.HIGH |
"High" |
CustomerPriority.HIGH |
"" |
null |
"VERY_HIGH" |
null |
A very important case is to import values from entity references. Entity associations are supported to some degree. For all not supported cases, the custom attribute binding script can be used.
Many-to-one associations are supported by the default binding. In order to use this behavior, it is required that the entity instance that should get referenced is already in the database.
In order to reference an entity in a N:1 fashion, the entity attribute in the "Entity attribute mapper" has to be set.
Example:
In this example the MlbPlayer
entity has a reference to the MlbTeam entity.
@Entity(name = "ddcdi$MlbPlayer")
public class MlbPlayer extends StandardEntity {
@ManyToOne(fetch = FetchType.LAZY)
@JoinColumn(name = "TEAM_ID")
protected MlbTeam team;
}
@Entity(name = "ddcdi$MlbTeam")
public class MlbTeam extends StandardEntity {
@Column(name = "NAME", nullable = false)
protected String name;
@Column(name = "CODE", nullable = false)
protected String code;
}
The import file (CSV) looks like this:
"Name","Team",
"Adam Donachie","ATH"
"Paul Bako","CEN"
"Ramon Hernandez","BAL"
"Kevin Millar","CEN"
"Chris Gomez","MLN"
In this case, the Team is referenced by the attribute code
of the MlbTeam
entity.
This means, that the Entity attribute mapper for this example would look like this:
- Column name:
Team
- Entity attribute:
team.code
It is also possible to not only bind through one association, but rather through multiple associations.
In case MlbTeam
would have a attribute state
of type UsState
with an attribute code
it would also work.
The corresponding entity attribute would be team.state.code
.
The requirement for this to work is, that there has to be a unique match to identify the correct association.
The following examples would not work:
Let's assume we have the following import file row:
"Name","Team State",
"Adam Donachie","MD"
where "MD" is Maryland and we would like to assign "the" team that is based in Maryland. As you might have noticed, there is an obvious problem already in this sentence: "the team". There might be multiple teams in Maryland, correct (in fact currently there is only one team - the "Baltimore Orioles")?
This is where the uniqueness problems occur.
The following examples will lead to a non-unique result and therefore will not work:
- there are multiple entity instances that have this values (two
UsState
entity instances that have the code "MD") - there are multiple entity instances that reference an entity which has this value (two
MlbTeam
entity instances that have a reference to theUsState
entity instance "Maryland")
In case such a situation occurs, the corresponding data row with all non-unique results are logged. Nothing will be bound in this case.
Currently binding of 1:N / M:N entity associations are not supported automatically. Instead the custom attribute binding script can be used for this purpose.
An example use case can be found in the example project cuba-example-using-data-import.
There the following example shows the behavior of M:N association binding: MlbPlayer -- M:N --> BaseballStrength
.
In particular there are the following example configuration within the directory example-data/mlb/mlb_players-with-strengths:
- MlbPlayer Import Configuration with custom strengths binding (ImportConfiguration-mlb-player-with-strengths.json)
- MlbPlayer Import Attribute Mapper with custom strengths binding (ImportAttributeMapper-mlb-player-with-strengths.json)
- MlbPlayer CSV Import file (mlb_players-with-strengths.csv)
Note: The Baseball Strengths master data file has to be imported first.
Dynamic attributes are supported as a binding target. Currently the following dynamic attribute datatypes are supported:
- String
- Integer
- Double
- Boolean
- Date (java.util.Date)
- Enumeration
NOTE: Entity references within dynamic attributes are not supported currently.
In order to configure a dynamic attribute the Entity attribute mapper has to be configured with a plus sign as a prefix of the dynamic attribute name:
Let's assume the Entity MlbTeam
as a dynamic attribute category Stadium Information
. Within this category, there is one
dynamic attribute defined with the name stadiumName
. In this case the Entity attribute in the
Entity attribute mapper would be: +stadiumName
Integrations between systems is oftentimes highly dependent on the system / process to integrate with. Oftentimes the source and destination data sources oftentimes differ to a high degree.
This application component solves some of the problems that arise during this transformation from the source to the target data source either automatically or via configuration mappings. However, there are a lot of cases, where this kind of configuration is not enough.
Due to this, there is the possibility to create custom scripts like the preCommitScript
which enables further customizations.
However, sometimes the mapping exceeds this limits either because of particular limitations of the configuration or because of the scripts are not able to handle every use case.
A few examples of those limitations for the data-import application component are:
- dealing with composite keys
- automatic handling of M:N associations
- interacting with highly complex excel sheets that are far away from a BCNF database schema
In case custom parsing behavior of the original file is needed, that cannot be configured via the import configuration UI, it is oftentimes still possible to do it programmatically.
There is an example project: cuba-example-data-import-custom-parsing-logic that shows how to switch the separator character in the CSV import case to ;
. More information can be found in the corresponding README.
In those situations you should try to follow the following general advice:
Instead of rely on the data-import application component to do all the heavy lifting, take the data-import application component only as a first step in your data integration step.
Consider the following complex excel sheet:
This oftentimes is a common pattern for the usage of an excel sheet. Furthermore it is quite hard to automatically convert, since there are so many violations to a normalized data model etc.
Imagine there is the following destination data model:
public class Customer extends StandardEntity {
private String name;
private String customerId;
private List<Order> orders;
private CustomerType customerType;
}
public class Order extends StandardEntity {
private LocalDate orderDate;
private Customer customer;
private BigDecimal totalAmount;
}
public enum CustomerType {
REGULAR,
PREMIUM;
}
Transforming the original excel sheet into the destination data model is perhaps possible, but not the most straight forward thing to do.
Instead create a staging area as an entity that mirrors exactly the structure of the source excel sheet:
public class CustomerOrderRow extends StandardEntity {
private String customerName;
private BigDecimal salesNorth;
private String salesNorthNotes;
private BigDecimal salesWest;
private String salesWestNotes;
// ...
private String orderIdsInformation;
private String customerContactTelefonnumber;
private String customerTypeChanged;
}
This way you can still leverage the data-import component without hitting its limits. On the other hand, the logic to transform the source data model to the destination data model can be expressed as regular Java / groovy code as part of the application.
But since you have now a persistent entity acting as a stage area, you can apply the following additional functionality:
- allow users to do data clean up directly in the staging area
- use entity listeners & services
- test transformations with unit tests
This way, you can leverage the app component for still doing the file import. The logic to transform you just treat as a regular part of the application.