How to pull data from OKTA API example

OKTA has various rest APIs (refer this) from where you can pull the data and play around according to your business requirement. As OKTA stores only 90 days of records so in many cases you might need to store the data in external databases and then perform your data analysis.

In order to pull the data from OKTA I considered writing a shell script, probably because this looked very straight forward to me. But there are other methods as well which you can consider if you have wide project timeline. Lets see how this can be done with a shell script.

Step 1: Go through the API reference documents and filters which OKTA has provided online. It's seriously very well documented and that would help you in case you want to tweak this script.

Step 2: Get API access token from OKTA admin and validate if token is working properly or not with Postman client. Refer this.

Step 3: Once you have the API access token and basic understanding of API filters you will be able to tweak the script according to your need.

Step 4: Below is the complete shell program and brief explanation what each step is doing.

# Define your environment variables - organization, domain and api_token. These will be used to construct URL in further steps.

# If you want you can hide your API token, probably by reading token from a parameter file instead hard coding it.

# Start

ORG=company_name

DOM=okta

API_TOKEN=*********************

# Initialize variables with some default values.

# Change your destination path wherever you want to write the data.

# Val is basically the pagination limit and PAT/REP_PAT is basically the pattern and replace_pattern string which I used to format the JSON file in correct format. Date_range will be used to pull the data based on dates which user inputs.

VAL=1000

DEST_FILE=/var/spark/data

i=1

PAT=

REP_PAT=

DATE_RANGE=2014-02-01

# Choose the API for which you need the data (events, logs or users), you can modify the code if you want to export any other api data.

echo "Enter the name of API - events, logs, users. "

read GID

# Enter the date range to pull data

echo "Enter the date in format yyyy-mm-dd"

read DATE_RANGE

date_func() {

echo "Enter the date in format yyyy-mm-dd"

read DATE_RANGE

}

# Check if entered date is in correct format

if [ ${#DATE_RANGE} -ne 10 ]; then echo "Invalid date!! Enter date again..";

date_func

else

echo "Valid date!"

# Construct the URL based on all the variables defined earlier

URL=htt ps://$ORG.$DOM.com/api/v1/$GID?limit=$VAL

# Case to choose API name entered by user, 4 to 10 are empty routes if you want to add new APIs

case $GID in

events) echo "events API selected"

rm -f /var/spark/data/events.json*

URL=htt ps://$ORG.$DOM.com/api/v1/$GID?lastUpdated%20gt%20%22"$DATE_RANGE"T00:00:00.000Z%22\&$VAL

PAT=}]},{\"eventId\":

REP_PAT=}]}'\n'{\"eventId\":

sleep 1;;

logs) echo "logs API selected"

rm -f /var/spark/data/logs.json*

URL=htt ps://$ORG.$DOM.com/api/v1/$GID?lastUpdated%20gt%20%22"$DATE_RANGE"T00:00:00.000Z%22\&$VAL

PAT=}]},{\"actor\":

REP_PAT=}]}'\n'{\"actor\":

sleep 1;;

users) echo "users API selected"

PAT=}}},{\"id\":

REP_PAT=}}}'\n'{\"id\":

rm -f /var/spark/data/users.json*

URL=htt ps://$ORG.$DOM.com/api/v1/$GID?filter=status%20eq%20%22STAGED%22%20or%20status%20eq%20%22PROVISIONED%22%20or%20status%20eq%20%22ACTIVE%22%20or%20status%20eq%20%22RECOVERY%22%20or%20status%20eq%20%22PASSWORD_EXPIRED%22%20or%20status%20eq%20%22LOCKED_OUT%22%20or%20status%20eq%20%22DEPROVISIONED%22\&$VAL

echo $URL

sleep 1;;

4) echo "four" ;;

5) echo "five" ;;

6) echo "six" ;;

7) echo "seven" ;;

8) echo "eight" ;;

9) echo "nine" ;;

10) echo "ten" ;;

*) echo "INVALID INPUT!" ;;

esac

# Deleting temporary files before running the script

rm -f itemp.txt

rm -f temp.txt

rm -f temp1.txt

# Creating NEXT variable to handle pagination

curl -i -X GET -H "Accept: application/json" -H "Content-Type: application/json" -H "Authorization: SSWS $API_TOKEN" "$URL" > itemp.txt

NEXT=`grep -i 'rel="next"' itemp.txt | awk -F"<" '{print$2}' | awk -F">" '{print$1}'`

tail -1 itemp.txt > temp.txt

# Validating if URL is correctly defined

echo $URL

# Iterating the loop of pagination with NEXT variable until it's null

while [ ${#NEXT} -ne 0 ]

echo "this command is executed till NEXT is null, current value of NEXT is $NEXT"

curl -i -X GET -H "Accept: application/json" -H "Content-Type: application/json" -H "Authorization: SSWS $API_TOKEN" "$NEXT" > itemp.txt

tail -1 itemp.txt >> temp.txt

NEXT=`grep -i 'rel="next"' itemp.txt | awk -F"<" '{print$2}' | awk -F">" '{print$1}'`

echo "number of loop = $i, for NEXT reference : $NEXT"

(( i++ ))

cat temp.txt | cut -c 2- | rev | cut -c 2- | rev > temp1.txt

rm -f temp.txt

# Formatting the output to create single line JSON records

echo "PATTERN = $PAT"

echo "REP_PATTERN = $REP_PAT"

sed -i "s/$PAT/$REP_PAT/g" temp1.txt

mv temp1.txt /var/spark/data/$GID.json_`date +"%Y%m%d_%H%M%S"`

sleep 1

done

# END

How to write your first blog on Dataneb?

Every Gardener Must Know These Homemade Organic Pesticide Re...

A Day in the Life of a Computer Programmer

Terms

Policy

Privacy

Contact

Processing Time Calculator

Green Card Calculator

How to pull data from OKTA API example

Comments

Want to share your thoughts about this blog?