编程知识 cdmana.com

Big data review case

happybase operation Hbase grammar :

import happybase
##  link HBase database 
conn = happybase.Connection(host=’localhost’, port=9090, timeout=None, autoconnect=True, table_prefix=None, table_prefix_separator=b’_’, compat=’0.98’, transport=’buffered’, protocol=’binary’)
##  Create table 
conn.create_table(
	'shop',
	{
		'interfaceInfo' :dict(max_version=4),
		'inputInfo' :dict(max_version = 4)
	}
)
# Insert 
table = conn.table("shop")
with table.batch() as bat:
	bat.put('0001',{'interfaceInfo:inter_show':'HDM1', 'interfaseInfo:inter_network':'10Mbps', 'interfaceInfo:inter_three':'1 individual ','interfaceInfo:inter_Type-c':'1 individual '})
	bat.put('0001',{'inputInfo:input_one':' There's a pointing stick ','inputInfo:input_tow':' Full size keyboard ','inputInfo:input_three':' multi-touch ','inputInfo:input_four':' multi-touch '})

MapReduce grammar ( The transformer case is an example )

mapper.py
#! /usr/bin/python3
# mapper After processing, the result will automatically follow key Sort 
import sys

def mapper(line):
    key = float(line.split(",")[2])

    cat = ''        
    if key <= 630.00:
        cat = "mini"
    elif key <= 6300:
        cat = "mid"
    else:
        cat = "over"
    print("%s\t%s" % (cat, 1))



def main():
    for line in sys.stdin:
        line = line.strip()
        if line.startswith('child'):
        	break
        else:
            mapper(line)

if __name__ == '__main__':
    main()

reduce.py
#! /usr/bin/python3

import sys

def reducer(k, values):
    print("%s:\t:%s" % (k, sum(values)))

def main():
    current_key = None
    values = []
    akey, avalue = None, None

    for line in sys.stdin:
        line = line.strip()
      
        try:
            akey, avalue = line.split('\t')
        except:
            continue
        if current_key == akey:
            values.append(int(avalue))
        else:
            if current_key:
                reducer(current_key, values)
                values = []
            values.append(int(avalue))
            current_key = akey
    if current_key == akey:
        reducer(current_key, values)

if __name__ == '__main__':
    main()

Spark grammar

Two ways to create rdd
# 1. initialization  SparkContext, The object is  Spark  Program entry ,‘Simple App’ The name of the program is , General custom should be easy to understand 
    sc = SparkContext("local", "Simple App")
#  If you're using a file you've written , Just use parallelize establish rdd
# 2. Create a 1 To 5 A list of List
    data = [i for i in range(1,6)]

# 3. adopt  SparkContext  Parallel creation  rdd
    rdd = sc.parallelize(data)
#  If it is created using an external file rdd Words , According to the following sentence 
   rdd = textFile("/root/wordcount.txt")
The next step is to use the required operator to complete the task

Operator part edu Our website is as follows :
https://www.educoder.net/shixuns/imf67y2q/challenges

版权声明
本文为[osc_ 7oc4d1en]所创,转载请带上原文链接,感谢
https://cdmana.com/2020/12/20201224100744083M.html

Scroll to Top