GitHub and large data sets

Set up GitHub

  • Create a new access token associated to your GitHub profile by accessing the page: https://github.com/settings/tokens1When creating the token make sure you’re selecting the public_repo subcategory or the repo category based on which type of repository that will host your input data.2
    Keep in mind that GitHub will show the token once, so make sure you’re copying and pasting it somewhere safe.
  • Create a repository that will host your input data, if you don’t have one already
  • Commit and push a data source file. It can be a CSV, JSON or XML file. We will use CSV file for this example3

    Create a simple test

  • Create a test
  • Introduce the GitHub component and configure it accordingly4
  • The system will retrieve the document, parse it as a CSV file and assign it to the inputData variable
  • (optional) Verify that everything is set up correctly by adding a comment printing the parsed data as in:5
  • (optional) and run it:6
    NOTE: API Fortress will parse a CSV file as an array (rows) of arrays (columns), so access to the data is positional.
  • Now, let’s iterate over a subset of this input set. Introduce a selection strategy if necessary:
    7This will iterate over a subset of 5 randomly selected items. Other strategies are described in Appendix A
  • (optional) within each iteration, we suggest to introduce a comment that will help you identify which item you’re looking at, while debugging a failure8 9
  • Use the data to perform your HTTP call:10
  • Introduce some assertions, you know, that’s what you use API Fortress for…11
  • And run it12

Appendix A: selection strategies

Simple selectors

  • None. If the number of iterations is greater than 100, the system will randomly select 100 elements, unless you override the maximum iterator size.
    • Pick(n). Ask the system to randomly select a n-sized sample of elements.
      Example: inputData.pick(5)

 

  • Slice. If you’re interested in using a specific slice of data from the inputData, you can slice it according to your needs.
    Example: inputData[10..20] (will select items from index 10 to index 20)

 

Advanced Slicing

Assume you have a 1000 lines CSV file and you need to use them all. While this is technically possible (by overriding the maximum number of iterations) the usefulness of the test may vary on:

  1. How long does the HTTP request takes
  2. How complex the test is going to be
  3. The number of errors the test may trigger

Moreover, the readability of the resulting document may degrade when trying to debug an issue.

Here’s a slicing technique we suggest to ease these points.

  • Introduce the following 2 variables in the global parameters:

    13
  • Use the following expression in your each statement:
    inputData[offset.toInteger()..offset.toInteger()+limit.toInteger()]
    Which reads: slice inputData from the offset index to the offset+limit index

    Note: the toInteger() command is required as variables are always strings and we need to play with numbers.

    By doing so we are setting a baseline: as a default test input data from index 0 to index 99.
  • Introduce as many environments as the slices count, overriding the offset variable15
    Now you can run the test on specific 100 elements slices, by selecting the environment.16 17

    18

  • Finally, you can schedule your slices accordingly:19