Snowflake and REST API

Adrian Macal
4 min readNov 2, 2021

As expected from data warehousing solution Snowflake offers many connectors compatible with industry standards in each programming language. As modern data warehousing solution Snowflake offers REST API agnostic to any programming language.

Generally when I started using Snowflake I chose default option and applied ready to use Python connector. It is compatible with PEP 249 and therefore easy to start with. Recently I wanted to use it in AWS Lambda to expose some data over AWS API Gateway (which sounds like antipattern) and I came across some issues:

  • it is quite big (around 50 MB) and all imports take time to warm up my lambda function
  • it offers some support of async queries, but it adds a lot of abstraction to it, so all internal cloud benefits are removed

I spent time checking how it is implemented and how I can simplify it or even to get rid of it.

https://{account}.{region}.snowflakecomputing.com

Everything is hosted in some region and for some dedicated tenant. The address is very familiar to all Snowflake users. It is typical way how we access the console. Behind the address you can also find the REST API.

POST /session/v1/login-request?warehouse={warehouse}

As expected the very first activity is the authentication. Because I am accessing it with temporary username and password I need to provide them in this endpoint. The simplified request I managed to craft looks like this one:

{
"data": {
"CLIENT_APP_ID": "lightweight-client",
"CLIENT_APP_VERSION": "0.0.1",
"ACCOUNT_NAME": "...",
"LOGIN_NAME": "...",
"PASSWORD": "..."
}
}

If everything goes well I am getting token in a response something similar to:

{
"data": {
"masterToken": "ver:1-hint:30...90-ETM...iE",
"token": "ver:1-hint:30...90-ET...sI",
"validityInSeconds": 3600,
"masterValidityInSeconds": 14400,
"displayUserName": "Client 8...xC",
"serverVersion": "5.38.0",
"firstLogin": true,
"remMeToken": null,
"remMeValidityInSeconds": 0,
"healthCheckInterval": 45,
"newClientForUpgrade": null,
"sessionId": 30...62,
"parameters": [ ... ],
"sessionInfo": { ... },
"idToken": null,
"idTokenValidityInSeconds": 0,
"responseData": null,
"mfaToken": null,
"mfaTokenValidityInSeconds": 0
},
"code": null,
"message": null,
"success": true
}

POST /queries/v1/query-request?requestId={random-uuid}

Having a token I am able to issue SQL query passing the token in the Authorization header as Snowflake Token=”ver:1-hint:30..90-ET..l8" with the following payload. The important piece in the snippet is the asyncExec flag, which indicates we do not want to wait for a result.

{
"sqlText": "select t.index, t.value::text as country from table(flatten(input => parse_json(?))) t",
"asyncExec": true,
"sequenceId": 1,
"querySubmissionTime": 1635322866647,
"bindings": {
"1": {
"type": "ANY",
"value": "[\"DE\", \"PL\", \"US\"]"
}
}
}

The incoming response looks straightforward. The most important piece are the query id and the success flag. The flag indicates if the query was taken by the system, but does not tell anything about the its execution status.

{
"data": {
"queryId": "01a...197e",
"getResultUrl": "/queries/01a...197e/result",
"queryAbortsAfterSecs": 300,
"progressDesc": null
},
"code": "333334",
"message": "Asynchronous execution in progress. Use provided query id to perform query monitoring and management.",
"success": true
}

GET /monitoring/queries/{query-id}

The query was just issued. I can check only its status by calling another endpoint. I need to pass acquired token in the Authorization header. If everything is successful I should get something like the following response, which indicates that the query succeeded.

{
"data": {
"queries": [
{
"id": "01...7e",
"status": "SUCCESS",
"state": "SUCCEEDED",
"sqlText": "select t.index, t.value::text as country from table(flatten(input => parse_json(?))) t"
}
],
"sessionsShort": [ ... ]
},
"code": null,
"message": null,
"success": true
}

POST /queries/v1/query-request?requestId={random-uuid}

Knowing the query succeeded I will issue another synchronous query to the endpoint I already touched. This time I will query the result set by query id. It is very important to not forget about Authorization header and to adjust the Accept header to application/snowflake.

{
"sqlText": "select * from table(result_scan('01...7e'))",
"asyncExec": false,
"sequenceId": 1,
"querySubmissionTime": 1635066639000
}

And the response is promising. It contains even all the data.

{
"data": {
"parameters": [ ... ],
"rowtype": [
{ "name": "INDEX", ... },
{ "name": "COUNTRY", ... }
],
"rowset": [
[ "0", "DE" ],
[ "1", "PL" ],
[ "2", "US" ]
],
"total": 3,
"returned": 3,
"queryId": "01...62",
"queryResultFormat": "json"
},
"code": null,
"message": null,
"success": true
}

It is not the end of the story. The interesting piece is happening when my query returns more data to be included in this simple json response. How is it handled? The returned payload looks a bit differently. It does not contain any row in the rowset array, but it contains chunkHeaders and chunks. The chunks are simply S3 offloaded encrypted objects ready to download. The object have exactly the same JSON format as rowset would have.

{
"data": {
"parameters": [ ... ],
"rowtype": [
{ "name": "INDEX", ... },
{ "name": "COUNTRY", ... }
],
"rowset": [],
"qrmk": "kg...uB0=",
"chunkHeaders": {
"x-amz-server-side-encryption-customer-key": "...",
"x-amz-server-side-encryption-customer-key-md5": "..."
},
"chunks": [
{
"url": "https://...",
"rowCount": 525,
"uncompressedSize": 100002,
"compressedSize": 22505
}
],
"total": 11785,
"returned": 11785,
"queryId": "01...a2",
},
"code": null,
"message": null,
"success": true
}

Actual implementation

By introduction I mentioned I needed REST approach to deal with the Snowflake in AWS Lambda. Actually I managed to create working implementation and currently it is in POC stage.

--

--

Adrian Macal

Software Developer, Data Engineer with solid knowledge of Business Intelligence. Passionate about programming.