Keboola WIKI‎ > ‎Transformations‎ > ‎MySQL‎ > ‎

Array Splitting

Description: This transformation is used to split array-like fields (such as “item1,item2,item3”) and create a separate table with the items.


Examples: Some payment systems such as Recurly give you list of invoices that the payment is related to in one field as comma delimited array. Also, e-commerce order data from Google Analytics tend to come in that format. If you need list of invoices or order items as a separate table extracted from those fields, this is the template for you.


Tables:

  • <input> - table with dates

Screen Shot 2015-03-02 at 9.40.12 .png


  • <incremental> - help table with auto increments (make sure it is larger than the longest array - it can be WAY larger without affecting performance) - Note: optional - as per Petr’s nagging I added a quick generation of the table to the code - so choose whether you prefer to build it or to bring it in

Screen Shot 2015-03-02 at 9.40.30 .png


  • `output` - new output table

Screen Shot 2015-03-02 at 9.40.46 .png


Variables:

  • @delimiter - string separating the values, typically ‘,’, ‘;’ etc. (can be multi-character)


Columns:

  • <input>.<id> - unique ID column from <input> table

  • <input>.<array> - column containing the array to be splitted from <input> table

  • output.`inputId` - id of the row with the array in the input table

  • output.`item` - extracted item from the array

  • output.`rowID` - position of the item in the original array



#set the delimiter value

SET @delimiter = ',';


#create the `incremental` table in case you are not bringing it in, make sure it’s big enough for your array content

CREATE TABLE `incremental`

(`id` INTEGER PRIMARY KEY AUTO_INCREMENT);


INSERT INTO `incremental`

(`id`)

VALUES

(1),(2),(3),(4),(5);



#add and pre-calculate size of arrays to avoid repeated use of text functions later

ALTER TABLE `input`

ADD COLUMN `delCnt` INTEGER;


UPDATE `input`

SET `delCnt` = LENGTH(`array`) - LENGTH(REPLACE(`array`,@delimiter,'')) / LENGTH(@delimiter);


#that’s it

CREATE TABLE `output` AS

SELECT

i.`id` AS `inputId`,

ii.`id` AS `rowID`,

SUBSTRING_INDEX(SUBSTRING_INDEX(i.`array`,@delimiter,ii.`id`),@delimiter,-1) AS `item`

FROM `input` i JOIN `incremental` ii ON ii.`id` <= i.`delCnt` + 1;




Comments