Bear in mind that the C drive is not where the operating system lives. These nodes have a none standard Windows image and special environment variables listed here. That said, custom images can be used if you want to have software preloaded onto a compute node. For example if you data cleaning custom application is really large or has multiple versions it could be loaded into a VM image used when the compute pool is created. Maybe a topic for another post.
Lastly, consider my comment in point 20 about the data. This is downloaded onto the batch service compute nodes for processing and as mentioned we can RDP to them. By default this will be done using a dynamic public IP addresses. I therefore recommend adding the compute pool to an Azure Virtual Network (VNet) meaning access can only be gained via a private network and local address.
There will only ever be three of these JSON files regardless of how many reference objects are added in the data factory settings. To further clarify; if we reference 4 linked services, they will be added to the single JSON reference objects file in an array. The array will have a copy of the JSON used by Data Factory when it calls the linked service in a pipeline.
This gets worse with datasets, especially if there is dynamic content being passed from data factory variables. The dataset name within the reference file will get suffixed with a GUID at runtime. This is nice and safe for the JSON file created on the compute node, but becomes a little more tricky when we want to use the content in our custom application and it has a random name! For example we get this: AdventureWorksTable81e2671555fc4bc8b8b3955c4581e065
Hi Paul,
I am creating a ADF for updating data in Cosmos DB (Mongo API). My Source is csv file which has unique key and value. I have to insert this value in cosmos on the basis of key(key already exists in cosmos db which is unique)
I am doing this using pipeline ,but my old data is getting deleted and only new value is inserted in cosmosdb.
What i want is old_data+ new data in cosmos db.
Hi Paul,
We have a similar approach for azure custom activity in our one of the solution that is implemented on azure datafactory version 1, long back in 2016.
Now in the batch account which is in use for linked service in adf for custom activity, we are getting azure advisor notification to upgrade the api version to make the batch services operational.
Whereas i have found special advisory for the user who are using azure batch linked service for custom activity can ignore the advisor notification. Special advisory given in below link.
-us/rest/api/batchservice/batch-api-status#special-advisory-for-azure-data-factory-custom-activity-users
Could you please help to advise whether we can ignore this notification.
Api verison with batch pool is 2015.1.1.011
Thanks in advance.
Hi Paul, nice article, congrats
is it possible to create a custom activity that makes an API call to get the token to pass to the REST connector? my requirement is to call a protected API and save the content into azure SQL using ADF
thanks for your help
Use the AzureDataFactoryPipelineRunStatusAsyncSensor(deferrable version) to periodically retrieve thestatus of a data factory pipeline run asynchronously. This sensor will free up the worker slots sincepolling for job status happens on the Airflow triggerer, leading to efficient utilizationof resources within Airflow.
While working in Azure Data Factory, sometimes we need to retrieve metadata information, like the file name, file size, file existence, etc. We can use the Get Metadata activity to retrieve metadata information from the data set and then we can use that metadata information in subsequent activities. This article will give an example of how to do gather metadata from files and use it in the pipeline.
In this article, we discussed steps to work with metadata activity in Azure Data Factory and received metadata information about the files being processed. We capture the existence, the name, and the size. This is a very useful activity to retrieve metadata information for files and relational tables as a part of your pipeline.
I am new to Azure Data Factory and trying to solve a business use case where our data is stored in Azure Blob Storage which needs to be copied into Salesforce using Azure Data Factory. I have done a lot of research on this but all the examples I got are for copying data from Salesforce to SQL Server using Azure Data Factory.
df19127ead