Quick note about Hive and Presto

I am working to build application using Treasuredata. Treasuredata provide Hive or Presto to execute jobs. So here are my mote for those two.

Hive is a program to manage big data, built on top of Hadoop.

Apache Hive is a data warehouse software project built on top of Apache Hadoop for providing data summarization, query, and analysis. Hive gives an SQL-like interface to query data stored in various databases and file systems that integrate with Hadoop. Traditional SQL queries must be implemented in the MapReduce Java API to execute SQL applications and queries over distributed data. Hive provides the necessary SQL abstraction to integrate SQL-like queries (HiveQL) into the underlying Java without the need to implement queries in the low-level Java API. Since most data warehousing applications work with SQL-based querying languages, Hive aids portability of SQL-based applications to Hadoop.


https://en.wikipedia.org/wiki/Apache_Hive

How Hive works

Hive translates SQL queries into multiple stages of MapReduce and it is powerful enough to handle huge numbers of jobs (Although as Arun C Murthy pointed out, modern Hive runs on Tez whose computational model is similar to Spark’s). MapReduce is fault-tolerant since it stores the intermediate results into disks and enables batch-style data processing. Many of our customers issue thousands of Hive queries to our service on a daily basis. A key advantage of Hive over newer SQL-on-Hadoop engines is robustness: Other engines like Cloudera’s Impala and Presto require careful optimizations when two large tables (100M rows and above) are joined. Hive can join tables with billions of rows with ease and should the jobs fail it retries automatically. Furthermore, Hive itself is becoming faster as a result of the Hortonworks Stinger initiative.

How Presto Works

Presto is a SQL engine, built on top of Hadoop.

In some instances simply processing SQL queries is not enough—it is necessary to process queries as quickly as possible so that data scientists and analysts can use Treasure Data for quickly gaining insights from their data collections. For these instances Treasure Data offers the Presto query engine. Presto is an in-memory distributed SQL query engine developed by Facebook that has been open-sourced since November 2013.Presto has been adopted at Treasure Data for its usability and performance.

Presto versus Hive: What You Need to Know


Use Hive for batch – routine jobs. Use Presto to fetch smaller (simpler) data.

Active Directory

Active Directory (AD) is a directory service that Microsoft developed for Windows domain networks. It is included in most Windows Server operating systems as a set of processes and services. Initially, Active Directory was only in charge of centralized domain management. Starting with Windows Server 2008, however, Active Directory became an umbrella title for a broad range of directory-based identity-related services.


https://en.wikipedia.org/wiki/Active_Directory

Active Directory is a database based system that provides authentication, directory, policy, and other services in a Windows environment

LDAP (Lightweight Directory Access Protocol) is an application protocol for querying and modifying items in directory service providers like Active Directory, which supports a form of LDAP.


https://stackoverflow.com/questions/663402/what-are-the-differences-between-ldap-and-active-directory

Active Directory Domain Services is Microsoft’s Directory Server. It provides authentication and authorization mechanisms as well as a framework within which other related services can be deployed (AD Certificate Services, AD Federated Services, etc). It is an LDAP compliant database that contains objects. The most commonly used objects are users, computers, and groups. These objects can be organized into organizational units (OUs) by any number of logical or business needs. Group Policy Objects (GPOs) can then be linked to OUs to centralize the settings for various users or computers across an organization.

When people say “Active Directory” they typically are referring to “Active Directory Domain Services.” It is important to note that there are other Active Directory roles/products such as Certificate Services, Federation Services, Lightweight Directory Services, Rights Management Services, etc. This answer refers specifically to Active Directory Domain Services.


https://serverfault.com/questions/402580/what-is-active-directory-domain-services-and-how-does-it-work

Project Management

My note on: what is project and how should we proceed and close project successfully.

 

A project is temporary in that it has a defined beginning and end in time, and therefore defined scope and resources.

And a project is unique in that it is not a routine operation, but a specific set of operations designed to accomplish a singular goal. So a project team often includes people who don’t usually work together – sometimes from different organizations and across multiple geographies.

The development of software for an improved business process, the construction of a building or bridge, the relief effort after a natural disaster, the expansion of sales into a new geographic market — all are projects.

And all must be expertly managed to deliver the on-time, on-budget results, learning and integration that organizations need.

From : https://www.pmi.org/about/learn-about-pmi/what-is-project-management

 

5 steps in project management.

  1. Initiating
  2. Planning
  3. Executing
  4. Monitoring and Controlling
  5. Closing

Initiating

  • Call on right people with clear goal.

Define

  • Scope
  • Time
  • Quality
  • Communication method
  • What needs to be bought, subscribe order

Planning

  • Write all to do (Gannt Chart), Feature, Story, Tasks
  • Risk management

Executing

  • Just do it
  • Are assignments OK?

Monitoring Controlling

Check:

  • Is it on schedule?
  • Is quality OK?
  • All scopes are staying within scope?
  • Is initial risk management good enough?

Closing

  • Check and see if project has 1 come to the end of schedule, 2 complete objectivity.