Exception: Python in worker has different version 2.7 than that in driver 3.6
Resolved: Exception: Python in worker has different version 2.7 than that in driver 3.6, Python in worker has different version 2.7 than that in driver 3.6, PySpark cannot run with different minor versions.Please check environment variables PYSPARK_PYTHON and PYSPARK_DRIVER_PYTHON are correctly set.
When running the pyspark module program on the Alibaba Cloud server, the core error is reported as above
Server centos environment: python (default is python2), python3, that is, dual python environment
The installed pyspark==2.1.2 version is installed in the python3 environment. Note that the pyspark version must match the installed spark version (the installed spark version is 2.1.1)
Run as shown below: python3 xxx.py and the error is as follows
[root@way code]# python3 foreach.py
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
20/12/17 15:30:26 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
20/12/17 15:30:27 WARN Utils: Your hostname, localhost resolves to a loopback address: 127.0.0.1; using 172.16.1.186 instead (on interface eth0)
20/12/17 15:30:27 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
20/12/17 15:30:30 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0)
org.apache.spark.api.python.PythonException: Traceback (most recent call last):
File "/opt/software/spark/python/lib/pyspark.zip/pyspark/worker.py", line 125, in main
("%d.%d" % sys.version_info[:2], version))
Exception: Python in worker has different version 2.7 than that in driver 3.6, PySpark cannot run with different minor versions.Please check environment variables PYSPARK_PYTHON and PYSPARK_DRIVER_PYTHON are correctly set.
at org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRDD.scala:193)
at org.apache.spark.api.python.PythonRunner$$anon$1.<init>(PythonRDD.scala:234)
at org.apache.spark.api.python.PythonRunner.compute(PythonRDD.scala:152)
at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:63)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:99)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
20/12/17 15:30:30 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, localhost, executor driver): org.apache.spark.api.python.PythonException: Traceback (most recent call last):
File "/opt/software/spark/python/lib/pyspark.zip/pyspark/worker.py", line 125, in main
("%d.%d" % sys.version_info[:2], version))
Solution:
The error report shows that the variables PYSPARK_PYTHON and PYSPARK_DRIVER_PYTHON in the program should call python3, but the default python is version 2, but version 2 lacks libraries such as pyspark, so the error is reported.
Use the which is python3 command to find the location of python3, and specify the python version called by the above two variables in the program, as follows
from pyspark import SparkContext
# The following three lines are new content
import os
os.environ["PYSPARK_PYTHON"]="/usr/bin/python3"
os.environ["PYSPARK_DRIVER_PYTHON"]="/usr/bin/python3"
Save and run again, and it can be executed normally.