Increase the Scalability of HiveServer2 with Qubole
HiveServer2 (HS2) is an integral component of any Hive deployment that provides a multi-tenant service end-point for executing Hive queries concurrently. In Qubole Data Platform (henceforth abbreviated as QDP), the HiveServer2 Java Virtual Machine (JVM) runs on the Hadoop cluster’s master node, whereas the Beeline client runs on a multi-tenant tier on the QDP control plane.
https://www.qubole.com/blog/increase-scalability-of-hiveserver2/
Built analytics at Ada with Airflow and Redshift
At the beginning of 2018, when I was packing my bags to move from Moscow to Toronto, my dear colleagues, whom I didn’t know yet, were receiving an increasing amount of requests for insights into how Ada’s virtual assistant was doing. How many conversations did the virtual assistant start and received a response from the user? How many conversations happened on a mobile device? What are the most popular responses virtual assistant gives to French-speaking customers?
Migrating a Big Data Environment to the Cloud
We’ve discussed what we wanted our cloud MVP to look like. The next question was — how do we get there without turning the company off for a month? We started with what we knew we needed. For at minimum a few months, our application teams needed to be able to split internal services across GCP and our datacenter. So we needed a single logical network between the two. We spun up a Cloud Interconnect between us-central1 and our datacenter to bridge our environments
The Misunderstood Role Of A Data Engineer
Recently while out with a group of coworkers, I was being introduced to a new colleague and the conversation went something like this: “And this is Hussein, he’s a data scientist.” I am, however, a data engineer and not a data scientist. That same evening, the situation repeated itself with a different crowd: “This is Hussein, he’s in data… wait what do you actually do?” This wasn’t the first time I had this conversation, and it’s safe to say it wouldn’t be the last time either
Data profiling in the age of Big Data
In today’s world of increasing connectivity, there is no denying the fact that the amount of data being generated is enormous. Add to this fact, how inexpensive storage has become and how fast the network is, it is no wonder that we are talking about big data.
https://medium.com/tech-at-nordstrom/data-profiling-in-the-age-of-big-data-7675d486c89c