-
Notifications
You must be signed in to change notification settings - Fork 0
/
synapse.rmd
546 lines (354 loc) · 24.1 KB
/
synapse.rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
---
title: 'Azure: Synapse Analytics w/ DataBricks, DevOps, and AzureML'
author: "Andrew Fogarty"
date: "05/07/2021"
output:
html_document:
toc: yes
df_print: paged
rmarkdown::html_vignette:
toc: yes
number_sections: yes
editor_options:
chunk_output_type: console
---
```{r, message = FALSE, warning = FALSE}
# load python
library(reticulate)
use_python('C:/Users/Andrew/Anaconda3/')
use_condaenv(condaenv='my_ml', required=TRUE)
library(knitr)
```
# Introduction
Azure Synapse Analytics is a limitless analytics service that brings together data integration, enterprise data warehousing, and big data analytics. It gives you the freedom to query data on your terms, using either serverless or dedicated resources—at scale. Azure Synapse brings these worlds together with a unified experience to ingest, explore, prepare, manage, and serve data for immediate BI and machine learning needs.
In this demo, we will walk through the necessary procedures to:
* 1. Create an Azure Synapse environment
* 2. Write SQL tables to Parquet and dump them into our Data Lake
* 3. Run automated pipeline computations on our Data Lake files with AzureML
* 4. Connect our Data Lake with Azure Data Bricks for Big Data analytics
* 5. Setup Azure DevOps for Synapse version control
# Create an Azure Synapse Environment
In order to create an Azure Synapse environment, several preparatory objects are needed:
* 1. An Azure Resource Group
* 2. An Azure Storage account (can create on the fly when creating Synapse)
* 3. An Azure file system name (can create on the fly when creating Synapse)
In the image below, we create the Synapse workspace, storage account, and file system name in one single effort.
![](1.png){ width=80% }
# Initiate Azure Synapse Analytics
We can now launch Azure Synapse Analytics, which looks like the below image:
![](2.png){ width=80% }
# Create SQL Server and Database
We will first create a basic SQL server and database with `AdventureWorks` data so that we can gain experience setting that up before we extract information from there. When creating the SQL server, ensure that you access the `Networking` tab and select: `Allow Azure services and resources to access this server`.
![](3.png){ width=80% }
## Create AdventureWorks Database
Azure makes it easy to load a SQL database with sample data. Once the SQL server is provisioned, we need to create an Azure SQL Database in a similar fashion. Ensure that on the `Additional settings` tab, you select `Sample` from `Use existing data`.
![](4.png){ width=80% }
## Giving Synapse Access to the SQL Server
Next, we need to create some specific permissions for Synapse such that it can connect to our SQL data base. This requires that we gain access to our SQL server to execute a few commands. There are two ways to do this:
1. Set a firewall allowance for your IP. On the SQL Database page, select `Set server firewall`. Find your IP address and specify it here; hit `Save`. Now, you can select `Query editor` on the SQL Database page and login with the username and password you instantiated.
![](5.png){ width=80% }
2. Set yourself as an Azure Active Directory admin over the SQL *Server*. Now, return to the SQL *Database*, select `Query editor`, and login with Azure Active Directory.
![](6.png){ width=80% }
### Query Editor Commands
Now that we have access to our SQL server, we need to execute a few commands to let our Synapse application have `Managed Identity` access to it.
```{sql eval=FALSE}
create user [synapseaftemp] from external provider;
exec sp_addrolemember 'db_datareader', 'synapseaftemp';
```
This should successfully run. Now we are ready to instantiate a link to our SQL database with Azure Synapse.
![](7.png){ width=80% }
# Create a Connection to Azure SQL Database
To use a modern data warehousing technology like a Data Lake, we need to fill it with data. Data Lakes store data of any size, shape, and speed, and do all types of processing and analytics across platforms and languages. Synapse can fill our Data Lake with data from just about any source we can imagine, but to do that, we must setup connections between the sources of our data (e.g., SQL tables) and the sink (our Data Lake) so that we can copy the data and have our own versions for analysis.
Our first task is to connect to an Azure SQL Database which has the commonly known `AdventureWorks` data set already loaded. We take the following steps:
1. Select the `Data` tab
2. Select the `Linked`tab
3. Press the `+` sign
4. Select `Integration dataset`
5. Select `Azure`
6. Select `Azure SQL Database` -- the data source we want a connection with
7. Provide a unique name for the connection; here we use `aw_tables`
8. Select `+` under Linked Service; as its our first time establishing a connection here
9. Provide another name for the linked service; here we use `sqldatabase_adventureworks`
10. Select the server from subscription (or manually)
11. Select the `Server name`: `awtempserver`
12. Select the `Database name`: `adventureworks`
13. Select `Authentication type`: `Managed Identity` (the SQL commands above gave it this privilege)
14. Test the connection and create it if successful
15. Choose no specific table name (unless desired)
16. Select the `Import schema`: `None`
![](8.png){ width=80% }
We now have a connection established to our SQL database, select `Publish all`!
# Copy SQL Database Data to Azure Data Lake Storage Gen2
Now we are ready to retrieve data from our SQL database. We proceed with the following steps:
1. Select the `Integrate` tab on the left of Synapse
2. Press the `+` sign
3. Select `Copy Data tool`
4. Select `Built-in copy task`
5. Press `Next`
6. For `Source data store`, select the `Connection`: `sqldatabase_adventureworks`
7. Select the tables we want to retrieve: `SalesLT.Address` and `SalesLT.Customer`
8. Press `Next`
9. Apply no filter -- Press `Next`
10. For `Destination data store`, select `Connection`: `synapseaftemp-WorkspaceDefaultStorage` (our ADLSGen2), `Folder path`: `root/RAW/adventure_works` (where the parquet files will be stored), `Folder name suffix`: replace `.txt` with `.parquet` (or the extension of the file you want to use in the next step)
11. For `File format settings`, select `Parquet` from `File format`, select `Next`
12. Specify our data copy pipeline with an interpretable name: `aw_datacopy_pipeline`
13. Review the `Summary` and select `Next`, and then `Finish`
Now we can take a peak into our data lake and see that our pipeline successfully extracted our tables and placed them into the `root/RAW/adventure_works` folder.
![](9.png){ width=80% }
# Integrate Synapse with AzureML
## Generate Service Principal Security
Next, lets see how we can access and analyze the information in our data lake in conjunction with AzureML. At this stage, we assume that you have created an AzureML resource, created a compute cluster to use, and created a new notebook.
Next, we need to create a Service Principal. Service Prinicpals manages authentication and comes into handy when we want to automate workflows. This type of authentication decouples the authentication process from any specific user and allows for managed access control. To create a Service Principal, we:
1. Enter Azure Active Directory from the Azure Portal
2. Select `App registrations`
3. Select `new registration`
4. Give it a unique and task familiar name, e.g., AzureML SP CHIECXO
5. Assign it a single tenant
6. Select `Register`
Next, we need to retrieve information of interest from the app registration page:
1. Application (client) ID
2. Directory (tenant) ID
3. Secret
The secret is obtained by selecting `Certificates & secrets` and generating a `new client secret`. Save this information, as follows, to a file named `.env`.
```
AML_TENANT_ID=Directory (tenant) ID
AML_PRINCIPAL_ID=Application (client) ID
AML_PRINCIPAL_PASS=Secret
```
We do this in order to hide away this information from our notebooks. We will store this file on AzureML and load it with the python package `python-dotenv`.
## Ensure AzureML is Updated
First, run these commands in a jupyter notebook:
```{python, eval=FALSE}
!pip install --upgrade azureml-core
!pip install --upgrade azureml-pipeline-core
!pip install --upgrade azureml-sdk
!pip install --upgrade azureml-defaults
!pip install python-dotenv
```
## Establish the Service Principal
```{python, eval=FALSE}
# instantiate the service principal
sp = ServicePrincipalAuthentication(tenant_id=os.environ['AML_TENANT_ID'],
service_principal_id=os.environ['AML_PRINCIPAL_ID'],
service_principal_password=os.environ['AML_PRINCIPAL_PASS'])
```
## Establish a Workspace
```{python, eval=FALSE}
# instantiate a workspace
ws = Workspace(subscription_id = "2c3b88a1-7aa0-4107-b413-d4c701e0afc8",
resource_group = "rg_chie_training",
auth=sp, # use service principal auth
workspace_name = "training_aml")
print("Found workspace {} at location {}".format(ws.name, ws.location))
```
## Establish a Compute Cluster
```{python, eval=FALSE}
# choose a name for your CPU cluster
cpu_cluster_name = "cpu-cluster"
# Verify that cluster does not exist already
try:
cpu_cluster = ComputeTarget(workspace=ws, name=cpu_cluster_name)
print('Found existing cluster, use it.')
except ComputeTargetException:
compute_config = AmlCompute.provisioning_configuration(vm_size='Standard_DS3_v2',
idle_seconds_before_scaledown='400',
min_nodes=0,
max_nodes=4)
cpu_cluster = ComputeTarget.create(ws, cpu_cluster_name, compute_config)
cpu_cluster.wait_for_completion(show_output=True)
```
## Create Compute Package Dependencies
```{python, eval=FALSE}
# create a new runconfig object
aml_run_config = RunConfiguration()
# use the aml_compute you created above.
aml_run_config.target = cpu_cluster
# set docker base image to the default CPU-based image
aml_run_config.environment.docker.base_image = DEFAULT_CPU_IMAGE
# Use conda_dependencies.yml to create a conda environment in the Docker image for execution
aml_run_config.environment.python.user_managed_dependencies = False
# Specify CondaDependencies obj, add necessary packages
aml_run_config.environment.python.conda_dependencies = CondaDependencies.create(
conda_packages=['pip', 'pandas', 'numpy', 'python==3.6'],
pip_packages=['azureml-core==1.31.0', 'azureml-sdk', 'python-dotenv'])
print ("Run configuration created.")
```
## Generate Access to the ADLSGen2 Data Lake
To setup a link to the Synapse ADLSGen2 Data Lake, we need to instantiate the code below and retrieve a bunch of information as follows:
1. `subscription_id`: the subscription ID of the ADLSGen2 account found on the Azure portal -> Containers
2. `resource_group`: the resource group that is paying for the ADLSGen2 account
3. `account_name`: the name of the storage account associated with the ADLSGen2 found on Azure portal -> Containers
4. `tenant_id`: the tenant ID of your organization
5. `client_id`: the Service Principal client ID. You must go to the Azure portal, click App Registration, and create a new app registration for the ADLSGen2 account. The client ID for this app registration goes here.
6. `client_secret`: the secret for the app registration. To create one, go to: Azure Active Directory -> App Registration (for #5 above) -> Certificates & secrets
```{python, eval=FALSE}
# set ADLS Gen2 storage account alias in AML
adlsgen2_datastore_name = 'my_lake'
# subscription id of ADLS account
subscription_id=os.getenv("ADL_SUBSCRIPTION", "redacted")
# resource group of ADLS account
resource_group=os.getenv("ADL_RESOURCE_GROUP", "redacted")
# ADLS Gen2 account name
account_name=os.getenv("ADLSGEN2_ACCOUNTNAME", "aflake")
# tenant id of service principal -> Azure Active Directory
tenant_id=os.getenv("ADLSGEN2_TENANT", "redacted")
# client id of service principal -> Azure Active Directory -> App Registration
client_id=os.getenv("ADLSGEN2_CLIENTID", "redacted")
# the secret of service principal -> Azure Active Directory -> App Registration -> Certificates & secrets
client_secret=os.getenv("ADLSGEN2_CLIENT_SECRET", "redacted")
adlsgen2_datastore = Datastore.register_azure_data_lake_gen2(workspace=ws,
datastore_name=adlsgen2_datastore_name,
account_name=account_name, # ADLS Gen2 account name
filesystem='root', # ADLS Gen2 filesystem
tenant_id=tenant_id, # tenant id of sp
client_id=client_id, # client id of sp
client_secret=client_secret) # the secret of sp
```
## Pipeline Preparation
Notice that the location information below ignores `root`, the name of the container for the ADLSGen2 account. This is necessary because we already specify the filesystem above in `adlsgen2_datastore`.
```{python, eval=FALSE}
# create a Dataset object from parquet files located on the data lake
my_dataset = Dataset.Tabular.from_parquet_files([(adlsgen2_datastore, '/RAW/adventure_works/SalesLTAddress.parquet')])
```
```{python, eval=FALSE}
# location to write pipeline output to
step1_output_data_adlsgen2 = OutputFileDatasetConfig(name="processed_data",
destination=(adlsgen2_datastore, "/RAW/out")
).as_upload(overwrite=True)
```
Next we instantiate a `PythonScriptStep` which is the workhorse of an AzureML pipeline.
```{python, eval=FALSE}
# pipeline step 1
step1 = PythonScriptStep(
name="generate_data",
inputs=[my_dataset.as_named_input('raw_data')],
script_name="scripts/do_stuff.py",
arguments=["--save", step1_output_data_adlsgen2],
runconfig=aml_run_config,
allow_reuse=False
)
```
## Create do_stuff.py Pipeline Script
To do things to our parquet data, we need to prepare our `scripts/do_stuff.py` file located in AzureML very carefully. Below is an minimally working example:
```{python, eval=FALSE}
#
import os
from azureml.core import Run
import argparse
import pandas as pd
# load argparse
# note that --save refers to --save in PythonScriptStep
parser = argparse.ArgumentParser()
parser.add_argument("--save", type=str)
args = parser.parse_args()
# access input data
# note that 'raw_data' refers to my_dataset.as_named_input('raw_data') in PythonScriptStep
run = Run.get_context()
raw_data = run.input_datasets["raw_data"]
# turn the data into a dataframe
df1 = raw_data.to_pandas_dataframe()
# do something to the dataframe here
# finish by writing the processed data to storage
if not (args.save is None):
os.makedirs(args.save, exist_ok=True)
print("%s created" % args.save)
path = args.save + "/processed.parquet" # file name for parquet in ADLSGen2
write_df = df1.to_parquet(path)
#
```
## Execute the Pipeline
Now we can execute the pipeline. If successful, we will move on to publish it so we can continually trigger it as our needs require.
```{python, eval=FALSE}
# build the pipeline
pipeline1 = Pipeline(workspace=ws, steps=[step1])
# validate the pipeline
pipeline1.validate
# submit a pipeline run
pipeline_run1 = Experiment(ws, 'run_synapse_data').submit(pipeline1)
# run and wait for completion to check its results
pipeline_run1.wait_for_completion(show_output=True)
```
## Publish the Pipeline
If we want to run this pipeline repeatedly through Synapse or AzureML, we need to publish it. The endpoint below provides the unique identifiers needed to select the pipeline from among the many we create.
```{python, eval=FALSE}
# publish the pipeline for use in Synapse
published_pipeline = pipeline_run1.publish_pipeline(
name='synapse_aml_pipeline',
description='Endpoint for transforming synapse data and uploading to blob storage',
version='1.0')
print(published_pipeline.endpoint)
```
# Create a Synapse-AzureML Pipeline
To close the Synapse -> AzureML pipeline loop, we need to establish a link between AzureML and Synapse so we can hook up our pipeline. To do so, we need to go into our AzureML resource and give our Synapse application a `contributor` role through `Access control (IAM)`. Once complete, we must do the following in Synapse to establish a connection:
1. Select the `Integrate` tab on the left of Synapse
2. Under the `Activities` column, select `Machine Learning` and drag `Machine Learning Execute Pipeline` onto the pipeline space.
3. Click `Machine Learning Execute Pipeline` and select the `Settings` tab.
4. Establish a new Azure Machine Learning linked service
5. Test the connection and save.
## Run AzureML Pipeline after Synapse Copies Fresh Data to the ADLSGen2
With a connection in hand, we can now connect our Synapse pipeline to AzureML such that after Synapse brings in fresh data to our data lake we can then have AzureML run some arbitrarily complex processing on it and save its output back to our data lake. To do so, we need to:
1. Select the `Machine Learning Execute Pipeline`
2. Select `Settings`
3. For `Machine Learning pipeline name`, choose our published pipeline name: `synapse_aml_pipeline`
4. For `Machine learning pipeline ID`, choose the most recent result, or, the value returned from the above command in AzureML: `print(published_pipeline.endpoint)`
5. Publish
![](10.png){ width=80% }
## Trigger the Synapse-AzureML Pipeline
We are now ready to trigger the full pipeline which will:
1. Copy fresh data from our SQL database and place it into our data lake
2. AzureML will read and arbitrarily transform the data and then save it back to the data lake into the `out` folder
To run this operation, we need to select the `Integrate` tab on the left and select `Add trigger` -> `Trigger Now`.
# Connect Azure Data Lake Storage Gen2 to Azure Data Bricks
Our next logical task is to use our data in analysis with Data Bricks for parallel computing. To begin this process, we want to create an Azure Databricks workspace through the Azure Services homepage.
![](11.png){ width=80% }
With our Data Bricks instance created, our next step is to create the security architecture that will allow Data Bricks to interact with our Data Lake.
Our first step is to create a key vault named `dbkvdemo` using Azure `Key vaults`. Next, we access Azure `App registrations` and create a new registration called `db_app`. We want to extract a couple important items from our registration once it is completed:
* Application (client) ID: 40ee072f-1759-4148-956a-c9f395cd5ad0
* Directory (tenant) ID: 72f988bf-86f1-41af-91ab-2d7cd011db47
Next, we need to create a `Client secret` inside our `App registrations` for `db_app`. We do this by selecting `Certificates & secrets` in the left pane, clicking `+` for `New client secret`, adding a description `db_secret` and setting an appropriate expiration. We then want to copy the value for `db_secret` which should now be visible and will no longer be available after we leave this window: h_tVc.b2Z-Pq-alr~WU56mNm8V00Sez8l6.
Now, we need to return to our `Key vaults` where we select our key vault `dbkvdemo` and select `Secrets` on the left pane. We then select `+ Generate/Import`, give it a name, `dbappkey`, and give it the value from the `App registrations` -> `Certificates & secrets` -> `Client secrets` -> `Value` (h_tVc.b2Z-Pq-alr~WU56mNm8V00Sez8l6).
Now, we need to create a `Secret Scope` in Data Bricks. We can access that by opening our Data Bricks connection and editing its URL like so: `https://adb-368901377989901.1.azuredatabricks.net/?o=368901377989901` such that we add on this to the end: `#secrets/createScope`. We set our `ScopeName` to `db_scope`. Ensure that `Manage Principal` reads as `All Users` as `Creator` requires the premium subscription. For the `DNS Name` we can get that information from opening our `Key vault` -> `Properties` -> `Vault URI` (https://dbkvdemo.vault.azure.net/). The `Resource ID` is found on the same page under `Resource ID` and starts with `/subscriptions/`.
With our `Secret Scope` created, we can now create a new notebook and try accessing our Data Lake. To instantiate the access, we want to fill out the following code:
```{python, eval=FALSE}
configs = {"fs.azure.account.auth.type": "OAuth",
"fs.azure.account.oauth.provider.type": "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider",
"fs.azure.account.oauth2.client.id": "<application-id>",
"fs.azure.account.oauth2.client.secret": dbutils.secrets.get(scope="<scope-name>",key="<service-credential-key-name>"),
"fs.azure.account.oauth2.client.endpoint": "https://login.microsoftonline.com/<directory-id>/oauth2/token"}
# Optionally, you can add <directory-name> to the source URI of your mount point.
dbutils.fs.mount(
source = "abfss://<container-name>@<storage-account-name>.dfs.core.windows.net/",
mount_point = "/mnt/<mount-name>",
extra_configs = configs)
```
Where we replace the following fields with this data:
* `<application-id>`: The application (client) ID value found in our `App registrations`.
* `<scope-name>`: Our Data Bricks `Secret Scope` name: `db_scope`
* `<service-credential-key-name>`: The name of the `Secret` created under `Secrets` in Azure `Key vaults`. In our case, `dbappkey`.
* `<directory-id>`: The directly (tenant) ID found in our `App registrations`
* `<container-name>`: The name of our container for our Data Lake: `demolake` (found in Azure `Storage accounts`)
* `<storage-account-name>`: The name of the Azure storage account: `demoaf`
* `<mount-name>`: We will remove this and mount at the base of our Data Lake
Lastly, we need to provide some accesses to our Data Lake. To do so, navigate to the Azure `Storage accounts` and select `Access Control (IAM)`. Next we select `Add role assignment` and select `Storage Blob Data Contributor` because we want to read and write data through our Data Bricks application. We need to assign this privilege to a member, so we click `+ Select members` and in the search box, we enter our `App registration` name that we made earlier in this process named: `db_app`. We then review/assign the assignment. This may take a few minutes for the security to work.
Now we can inject all of this data into our Data Bricks Notebook like so:
![](12.png){ width=80% }
And we can see we now that have access to the parquet files located in our Data Lake and we can proceed with cleaning our data.
# Setup Azure DevOps in Synapse for Git control
Lastly, we want to create and setup Azure DevOps inside Synapse for a few reasons. First, we want to instantiate version control and second, we want to backup our work.
Our first step is to access Azure DevOps (https://dev.azure.com/) which is hosted outside of the usual Azure services cloud portal. Inside DevOps, we want to create a new project, called `synapse_demo` and initiate a repo
![](13.png){ width=80% }
Next, in Azure Synapse, we want to select the `Manage` tab on the left and select `Git configuration` and connect to our recently created repo on DevOps. For the `collaboration_branch` we should create a new branch called `dev` (unlike the picture below) to separate our work from `main`.
![](14.png){ width=80% }
We now have version control working in Synapse. Next, we will hit `Publish` at the top to send our work to DevOps and we will notice that in the `dev` branch, Synapse has pushed our pipelines and integrated dataset connections to DevOps. Our work is now backed up.
![](15.png){ width=80% }
# Copy Data From Kusto Servers to Data Lake
Within Microsoft, Kusto servers are often used to store and retrieve data. Yet, extracting bulk data from Kusto servers can be tricky, owing to their limits on file sizes and memory limitations. The code below, set in the `Source` section of our copy data tool, lets us extract any amount of information we desire and copy it to our data lake.
```{python, eval=FALSE}
set query_fanout_nodes_percent = 50;
let startTime = ago(3d);
let endTime = ago(0d);
cluster("kusto.windows.net").database("db1").nodecpuinfo
| where TIMESTAMP between (startTime .. endTime)
| project TIMESTAMP, Var1, Var2, Var3
| take 1750000;
```