Skip to content

New Download

With release 14 the download mechanism was fully revised. The now obsolete insight-download is deprecated and should be removed from the deployments directory. The now obsolete table DownloadData in database tempdb can be dropped. Insight-properties forwarding to insight-download must be deleted.

Deployment

Download functionality is now (in preparation for scalability requirements) distributed to multiple applications, so multiple deployments are required.

  • crawl
  • download
  • download-gateway
  • download-roots
  • download-schedule
  • insight-change-events

The default bulk size (that is the maximum number of datasets downloaded per request) is 100. In the case you must control the bulksize, set the environment variable DEFAULT_BULK_SIZE to the respective value.

CacheDuration

A tree configuration's download section can contain a cacheDuration. If set, downloads will be held in database cache for the given amount of time.

    "name": "some-tree",
    "download": {
      "mode": "auto",
      "cacheDuration": "2 hours"
    },

While duration notation in ISO-8601 Format is supported (e.g. PT15M for 15 minutes), these values are hard to write and read. The recommended format is "number space unit" as shown in the example, where number is integral and unit is out of days, hours, minutes.

Scheduling

Downloads can be calculated in advance. So, when clients request a download, it can be delivered immediately. Calculation in advance is achieved by the help of so called schedule plans.

If a tree is to be calculated in advance, its configuration's download section must name the schedule plan.

    "name": "my-tree",
    "download": {
      "mode": "auto",
      "schedule": "nightly"
    },
    "root": {
      ....

This way tree my-tree is configured to be calculated in advance at a schedule with name nightly. Multiple trees can follow the same schedule plan. If a tree should follow multiple schedule plans, list notation can be used.

For scheduled downloads there is no need to configure any cacheDuration.

Configuring the names of schedule plans, a tree should follow, is necessary, but not sufficient. The actual scheduling is still to be defined. Schedule plans are defined by insight properties prefixed with "schedule". Schedule plan nightly from the example above could be defined like this:

    #insight properties
    schedule.nightly=every day; at 03:00

Daily, weekly and monthly intervals are supported. The rules are best explained by examples:

Notice that weekdays must be notated capitalized!

    #insight properties
    schedule.nightly=every day; at 03:00
    schedule.planA=every day; but Saturday; at 21:30
    schedule.planB=every day; but Saturday,Sunday; at 22:35
    schedule.planC=every Monday; at 05:45
    schedule.ultimo=every month; on the -1st; at 22:00
    schedule.firstSundayOfMonth=every month; on the 1st Sunday; at 12:00
    schedule.lastSundayOfMonth=every month; on the -1st Sunday; at 12:00

After modifying or adding schedule plans, download-schedule has to be restarted in order to obey the changed plans.

Download for each relevant User-Profile

download-schedule will cause downloading all trees with the given schedule plan at the respective schedule. Not that obvious is how many exemplars of a tree will be downloaded.

It depends on two factors, how many exemplars of a tree will be downloaded - the queryParams and who was logged in. queryParams divide application data in equivalence classes. For each equivalence class, a user was logged in for, a download of that tree will be calculated. Example: Given a tree with a single queryParam site and given that all formerly logged in users' site is either HD or WHM. Then two exemplars will be downloaded: site=HD and site=WHM. When a user later requests the download, that of the user's site will be delivered.

Warning: Careless configuration of schedule plans can lead to huge download data and heavy load associated with it. It's generally a bad idea to associate schedule plans to personal trees (trees that have login in its queryParams), or to trees with list-ish queryParams (queryParam whose values are lists), because the number of equivalence classes can be very high.

Deleting unwanted downloads

If downloads are cached due to a set cacheDuration or schedule, they will persist until their expiration date is reached. This expiration date is calculated at download/schedule time. If a configuration gets changed and you want to get rid of the now obsolete cached downloads, you can delete them in your browser at http://localhost:8080/download/admin.

Incorporating incremental changes

Currently this feature is only supported for Maximo backends.

Calculating downloads in advance has one drawback: data accuracy. Changes that were made between the precalculation and the request time are not reflected within the download. So called delta changes help to mitigate the problem. Instead of downloading the entire content again, only the changes since the most recent schedule are calculated an incorporated into the download. This way, data can be kept accurate without causing heavy load and/or bandwith.

This mechanism is limited by the fact, that only changes within root objects of the tree are considered. Furthermore the launchpoint of the root type, that propagates changes to insight middleware, must be activated.

In order to provide and incorporate incremental changes the following applications must be deployed:

  • insight-reconciler
  • insight-recent-changes

In the case that you don't want delta changes be incorparated into precalculated downloads, you can turn them off:

    "name": "my-tree",
    "download": {
      "mode": "auto",
      "schedule": "nightly",
      "delta": "disabled"
    },
    "root": {
      ....