ããShark为äºå®ç°Hiveå
¼å®¹ï¼å¨HQLæ¹é¢éç¨äºHiveä¸HQLç解æãé»è¾æ§è¡è®¡åç¿»è¯ãæ§è¡è®¡åä¼åçé»è¾ï¼å¯ä»¥è¿ä¼¼è®¤ä¸ºä»
å°ç©çæ§è¡è®¡åä»MRä½ä¸æ¿æ¢æäºSparkä½ä¸ï¼è¾
以å
ååå¼åå¨çåç§åHiveå
³ç³»ä¸å¤§çä¼åï¼ï¼åæ¶è¿ä¾èµHive MetastoreåHive SerDeï¼ç¨äºå
¼å®¹ç°æçåç§Hiveåå¨æ ¼å¼ï¼ãè¿ä¸çç¥å¯¼è´äºä¸¤ä¸ªé®é¢ï¼ç¬¬ä¸æ¯æ§è¡è®¡åä¼åå®å
¨ä¾èµäºHiveï¼ä¸æ¹ä¾¿æ·»å æ°çä¼åçç¥ï¼äºæ¯å 为MRæ¯è¿ç¨çº§å¹¶è¡ï¼å代ç çæ¶åä¸æ¯å¾æ³¨æ线ç¨å®å
¨é®é¢ï¼å¯¼è´Sharkä¸å¾ä¸ä½¿ç¨å¦å¤ä¸å¥ç¬ç«ç»´æ¤çæäºè¡¥ä¸çHiveæºç åæ¯ï¼è³äºä¸ºä½ç¸å
³ä¿®æ¹æ²¡æå并å°Hive主线ï¼æä¹ä¸å¤ªæ¸
æ¥ï¼ã
ããSpark SQL解å³äºè¿ä¸¤ä¸ªé®é¢ã第ä¸ï¼Spark SQLå¨Hiveå
¼å®¹å±é¢ä»
ä¾èµHQL parserãHive MetastoreåHive SerDeãä¹å°±æ¯è¯´ï¼ä»HQL被解æææ½è±¡è¯æ³æ ï¼ASTï¼èµ·ï¼å°±å
¨é¨ç±Spark SQLæ¥ç®¡äºãæ§è¡è®¡åçæåä¼åé½ç±Catalystè´è´£ãåå©Scalaç模å¼å¹é
çå½æ°å¼è¯è¨ç¹æ§ï¼å©ç¨Catalystå¼åæ§è¡è®¡åä¼åçç¥æ¯Hiveè¦ç®æ´å¾å¤ãå»å¹´Spark summitä¸Catalystçä½è
Michael Armbrust对Catalyståäºä¸ä¸ªç®è¦ä»ç»ï¼2013 | Spark Summitï¼ç¥ä¹ç«ç¶ä¸è½èªå®ä¹é¾æ¥çæåï¼ï¼ã第äºï¼ç¸å¯¹äºSharkï¼ç±äºè¿ä¸æ¥ååäºå¯¹Hiveçä¾èµï¼Spark SQLä¸åéè¦èªè¡ç»´æ¤æäºpatchçHiveåæ¯ãSharkåç»å°å
¨é¢éç¨Spark SQLä½ä¸ºå¼æï¼ä¸ä»
ä»
æ¯æ¥è¯¢ä¼åæ¹é¢ã
ããæ¤å¤ï¼é¤äºå
¼å®¹HQLãå éç°æHiveæ°æ®çæ¥è¯¢åæ以å¤ï¼Spark SQLè¿æ¯æç´æ¥å¯¹åçRDD对象è¿è¡å
³ç³»æ¥è¯¢ãåæ¶ï¼é¤äºHQL以å¤ï¼Spark SQLè¿å
建äºä¸ä¸ªç²¾ç®çSQL parserï¼ä»¥åä¸å¥Scala DSLãä¹å°±æ¯è¯´ï¼å¦æåªæ¯ä½¿ç¨Spark SQLå
建çSQLæ¹è¨æScala DSL对åçRDD对象è¿è¡å
³ç³»æ¥è¯¢ï¼ç¨æ·å¨å¼åSparkåºç¨æ¶å®å
¨ä¸éè¦ä¾èµHiveçä»»ä½ä¸è¥¿ã
ããè½å¤å¯¹åçRDD对象è¿è¡å
³ç³»æ¥è¯¢ï¼ä¸ªäººè®¤ä¸ºå¤§å¤§éä½äºç¨æ·é¨æ§ãä¸æ¹é¢å½ç¶æ¯å 为çæSQLç人æ¯çæSpark APIç人å¤ï¼å¦ä¸æ¹é¢æ¯å 为Spark SQLä¹ä¸æCatalyst驱å¨çæ¥è¯¢è®¡åä¼åå¼æãè½ç¶å¨å¾å¤æ¹é¢Sparkçæ§è½å®çHadoop MapReduce好å æ¡è¡ï¼ä½Sparkçè¿è¡æ¶æ¨¡åä¹æ¯MapReduceå¤æä¸å°ï¼ä½¿å¾Sparkåºç¨çæ§è½è°ä¼æ¯è¾trickyãè½ç¶ä»ä»£ç éä¸æ¥çï¼Sparkåºç¨å¾å¾æ¯å¯¹ççMRåºç¨ç好å åä¹ä¸ï¼ä½è£¸ç¨Spark APIå¼åé«æSparkåºç¨è¿æ¯éè¦è±äºå¿æçãè¿å°±ä½ç°åºSpark SQLçä¼å¿äºï¼å³ä¾¿ç¨æ·ååºçæ¥è¯¢ä¸é£ä¹é«æï¼Catalystä¹å¯ä»¥èªå¨åºç¨ä¸ç³»å常è§ä¼åçç¥ã
温馨提示:答案为网友推荐,仅供参考