오지's blog

spark install (ec2환경 프리티어, ubuntu 20.04) 본문

개발노트/서버 관리

spark install (ec2환경 프리티어, ubuntu 20.04)

오지구영ojjy90 2022. 4. 5. 12:23
728x90
반응형

설치 환경

EC2프리티어

Ubuntu 20.04

 

 

 

 

$ sudo apt update

$ sudo apt -y upgrade

$ [ -f /var/run/reboot-required ] && sudo reboot -f

 

 

JAVA설치

$ sudo apt install curl mlocate default-jdk -y

설치 및 JAVA버전 확인

$ java --version

 

SPARK설치 파일 다운로드

$ wget https://dlcdn.apache.org/spark/spark-3.2.1/spark-3.2.1-bin-hadoop3.2.tgz

설치 파일 압축풀기

$ tar -xvf spark-3.2.1-bin-hadoop3.2.tgz

$ sudo mv spark-3.2.1-bin-hadoop3.2/ /opt/spark

 

환경변수 설정

$ vim ~/.bashrc

 

##############################################################################
alias python='python3'
alias pip='pip3'
export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64
export SPARK_HOME=/opt/spark
export PATH=$PATH:$SPARK_HOME/bin:$SPARK_HOME/sbin
export PYTHONPATH="${SPARK_HOME}/python:${SPARK_HOME}/python/build:${PYTHONPATH}"
##############################################################################

 

$ source ~/.bashrc

 

 

$ start-master.sh 
starting org.apache.spark.deploy.master.Master, logging to /opt/spark/logs/spark-ubuntu-org.apache.spark.deploy.master.Master-1-ip-172-31-40-209.out

 

$ sudo ss -tunelp | grep 8080
tcp    LISTEN  0       1                             *:8080              *:*     users:(("java",pid=4496,fd=252)) uid:1000 ino:31309 sk:6 v6only:0 <->       

 

$ start-worker.sh spark://ubuntu:7077
starting org.apache.spark.deploy.worker.Worker, logging to /opt/spark/logs/spark-ubuntu-org.apache.spark.deploy.worker.Worker-1-ip-172-31-40-209.out

 

$ sudo updatedb

 

 

$ /opt/spark/bin/spark-shell

 

실행 화면

 

다음편. pyspark를 통해서 mysql 데이터 read

 

Reference

https://computingforgeeks.com/how-to-install-apache-spark-on-ubuntu-debian/

Comments