A slick migration trick - convert columns to multi-value field with subfields

By John Locke on December 12, 2017

In the previous post on A custom quantity price discount for Drupal Commerce we created a compound field for price breaks, which was composed by two subfields. One for a quantity threshold, the other for the price at that threshold.

That post covered everything needed to get quantity discounts working within Drupal Commerce, but for this particular project, we also had to find a way to populate these price breaks through the integration with their Sage accounting system.

Source Data

We're starting with their existing CSV exports, and one of them is a file that provides up to 5 pricebreaks for each project. These files get copied up to the webserver each hour, and then the migration into Drupal will get triggered.

Here's what the data looks like in the source CSV file:

Pricebreak CSV Data

... This file has other pricing data, for customer-specific pricing, so our migration skips those rows. The key thing is that there are 5 pairs of columns for the threshold and pricing, called "BreakQuery1" - "BreakQuery5" and "DiscountMarkup1" - "DiscountMarkup5". They use a very large number (99999999) for the largest tier.

So our migration needs to match these up into pairs, and add each pair as a delta in the price break field for that product variation.

Migration Configuration

Setting up migrations is covered elsewhere, as is importing/exporting configurations. For a project like this, we simply copy an existing migration configuration to a new file, edit it with a text editor, and import it into the site's configuration. For this migration, that was all we needed -- there is no supporting code whatsoever, the entire migration is handled with just a YAML file.

We're going to name this migration after the source file, im_pricecode, so the basic setup goes like this:

Export the site configuration.
Copy another CSV migration file to a new "migrate_plus.migration.im_pricecode.yml" file.
Inside the file, delete the uuid line, and set the id to "im_pricecode". When the configuration is imported, Drupal will assign a new UUID, which will get added when you next export the configuration, and the id needs to match the filename.
Add migration tags/group as appropriate.

That's the basic setup. There are 3 crucial parts to a migration configuration: source, process, and destination. All of the heavy lifting here is in the process block, so we'll tackle that last.

Here's our source section:

source:

  plugin: csv

  path: /path/to/IM_PriceCode.csv

  delimiter: ','

  enclosure: '"'

  header_row_count: 1

  keys:

    - ItemCode

    - PriceCodeRecord

  track_changes: true

This section needs to get set to the data source. (We already have plans to swap this for a webservice call down the road). In this case, we're using the "csv" source plugin provided by the Migrate Source CSV module. Of particular note here:

header_row_count here indicates that the first line of the file contains column headers, which will be automatically available on the source $row object in the migration, for processing.
keys need to specify a field or set of fields that uniquely identify the row in the CSV file. In our case, the file also contains customer-specific pricing data -- this makes the ItemCode appear multiple times in the file, so we need to add the second column to get a unique key.
track_changes stores a hash of the source row in the migration table. On future migration runs, the source row is checked against this hash to determine if there's a change -- if there is, the row will get updated, if not, it will be skipped.

Next, the Destination:

destination:

  plugin: 'entity:commerce_product_variation'

  destination_module: commerce_product

  default_bundle: default

migration_dependencies:

  required:

    - ci_item_variation

The destination is pretty self-explanatory. However, there's one crucial thing here: we are running this migration on top of entities that have already been created by another migration, "ci_item_variation". We need to declare that this migration runs after ci_item_variation, by using migration_dependencies.

Finally, we get to our "process" block. Let's take this a field at a time:

process:

  variation_id:

    -

      plugin: migration_lookup

      migration: ci_item_variation

      source: ItemCode

      no_stub: true

    -

      plugin: skip_on_empty

      method: row

variation_id is the primary key for commerce_product_variation entities. We are running this through two plugins: migration_lookup (formerly known as just migration) and skip_on_empty.

The migration_lookup loads up the existing product_variation, by looking up the variation_id on the previous ci_item_variation migration. We use "no_stub" to make sure we don't create any new variations from possible bad data or irrelevant rows. The second plugin, skip_on_empty, is another safety check to be sure we're not migrating bad data in.

pricing_method:

  -

    plugin: static_map

    source: PricingMethod

    map:

      O: O

    default_value: ''

  -

    plugin: skip_on_empty

    method: row

PricingMethod is what indicates the type of pricing each row in the source file contains. We don't need to migrate this field, but we do want to use it as a filter -- we only want to migrate rows that have this set to "O". So we chain together two more plugins to achieve this filtering, and we assign the result to a dummy field that won't ever get used.

The first static_map plugin maps the "O" rows to "O", and everything else gets set to an empty string.

The next plugin, skip_on_empty, simply skips rows that have the empty string set -- which is now everything other than rows with an "O" in this column.

Now we get to the fun stuff:

break1:

  plugin: get

  source:

    - BreakQuantity1

    - DiscountMarkup1

break2:

  plugin: get

  source:

    - BreakQuantity2

    - DiscountMarkup2

break3:

  plugin: get

  source:

    - BreakQuantity3

    - DiscountMarkup3

break4:

  plugin: get

  source:

    - BreakQuantity4

    - DiscountMarkup4

break5:

  plugin: get

  source:

    - BreakQuantity5

    - DiscountMarkup5

The whole trick to getting the 10 columns collapsed into up to 5 instances of our quantity price break field, each with two sub-fields, is to rearrange the data into a format that matches the field. This involves creating two layers of dummy fields. The first layer are these 5 dummy fields, break1 - break5, which group the BreakQuantityN and DiscountMarkupN fields together. The resut are these 5 dummy fields that each have a BreakQuantity and a DiscountMarkup.

One crucial point here is the order -- these are now set as an array with two values. If we take a look at the source data, the row for ItemCode 23311 will have an array that looks like:

$row->break1 = [
0 => "9",
1 => "5.1",
];
$row->break2 = [
0 => "99999999",
1 => "4.85",
];
$row->break3 = [
0 => "0",
1 => "0",
];
...

Remember that each of these is an indexed array, with 0 and 1 as keys.

Now we create our second layer -- a dummy field that combines the other 5 dummy fields:

breaks:

  plugin: get

  source:

    - '@break1'

    - '@break2'

    - '@break3'

    - '@break4'

    - '@break5'

Prefixing a fieldname with "@" means use the row from the destination (the one we're building up) and not the original source row -- this lets us access the dummy fields we just created. The YAML parser requires them to be quoted.

So now we have a value built up that pretty much matches the entity field structure:

$row->breaks = [
0 => [
    0 => "9",
    1 => "5.1",
],
1 => [
    0 => "99999999",
    1 => "4.85",
],
2 => [
    0 => "0",
    1 => "0",
],
...
];

Now we assign this to our real destination field:

field_qty_price_breaks:

  plugin: iterator

  source: '@breaks'

  process:

    threshold:

      plugin: skip_on_empty

      method: process

      source: '0'

    price: '1'

For the actual field values, we need to map the values to the actual subfields -- and we want to skip the empty pricebreak columns. The iterator plugin lets us loop through the breaks, and then assign the subfields with further processing.

The threshold sub-field is 0 if this break is unused, so we can use "skip_on_empty" to skip it -- this time we use the "process" method to only skip this field, and not the entire row. Because the threshold is the first value in the indexed array for each pricebreak, this is the "0" source (and needs to be quoted to be interpreted as an index).

The price sub-field is simply the second value in each of the pricebreak fields, with index "1".

Migrate is cool!

We love working with migrate module in Drupal 8 because of how powerful it is, and how easy to customize. We routinely create process plugins to quickly do some custom transformations, or create source plugins to connect to custom data sources, and I thought I would have to do one of those to solve this problem -- but it turns out some YAML and dummy fields worked with no additional code, pretty much the first time!

Hat tip to MTech for the basic technique! If you have an improvement or suggestion for other cool migrate tricks, leave a comment below! Or if you have a data integration you need done, reach out...

Topic

Integrations with other systems

Add new comment

Your name

The content of this field is kept private and will not be shown publicly.

Homepage

Notify me when new comments are posted

All comments

Replies to my comment

Comment

Filtered HTML

Web page addresses and email addresses turn into links automatically.
Allowed HTML tags: <a href hreflang> <em> <strong> <blockquote cite> <cite> <code> <ul type> <ol start type> <li> <dl> <dt> <dd> <h1> <h2 id> <h3 id> <h4 id> <h5 id> <p> <br> <img src alt height width>
Lines and paragraphs break automatically.

Drupal Canvas — Block HTML (locked)

Allowed HTML tags: <strong> <em> <u> <a href> <p> <br> <ul> <ol> <li>

Drupal Canvas — Inline HTML (locked)

Allowed HTML tags: <strong> <em> <u> <a href>

A slick migration trick - convert columns to multi-value field with subfields

Source Data

Migration Configuration

Migrate is cool!

Topic

Tags

Comments

How are you previewing the…

Inside the process function?…

Excellent! Thank you John!

Add new comment

Filtered HTML

Drupal Canvas — Block HTML (locked)

Drupal Canvas — Inline HTML (locked)

More Like This

Deploying blocks and content to other site environments

New Year, New Website!

A slick migration trick - convert columns to multi-value field with subfields

Added D8 Rules support to Matrix API

Fixing broken URL Aliases on a Drupal 6 -> Drupal 8 migration

Importing foreign key references with Migrate