Exception: Python in worker has different version 2.7 than that in driver 3.6

Resolved: Exception: Python in worker has different version 2.7 than that in driver 3.6, Python in worker has different version 2.7 than that in driver 3.6, PySpark cannot run with different minor versions.Please check environment variables PYSPARK_PYTHON and PYSPARK_DRIVER_PYTHON are correctly set.

When running the pyspark module program on the Alibaba Cloud server, the core error is reported as above

Server centos environment: python (default is python2), python3, that is, dual python environment

The installed pyspark==2.1.2 version is installed in the python3 environment. Note that the pyspark version must match the installed spark version (the installed spark version is 2.1.1)

Run as shown below: python3 xxx.py and the error is as follows

[root@way code]# python3 foreach.py 
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
20/12/17 15:30:26 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
20/12/17 15:30:27 WARN Utils: Your hostname, localhost resolves to a loopback address: 127.0.0.1; using 172.16.1.186 instead (on interface eth0)
20/12/17 15:30:27 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
20/12/17 15:30:30 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0)
org.apache.spark.api.python.PythonException: Traceback (most recent call last):
  File "/opt/software/spark/python/lib/pyspark.zip/pyspark/worker.py", line 125, in main
    ("%d.%d" % sys.version_info[:2], version))
Exception: Python in worker has different version 2.7 than that in driver 3.6, PySpark cannot run with different minor versions.Please check environment variables PYSPARK_PYTHON and PYSPARK_DRIVER_PYTHON are correctly set.

	at org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRDD.scala:193)
	at org.apache.spark.api.python.PythonRunner$$anon$1.<init>(PythonRDD.scala:234)
	at org.apache.spark.api.python.PythonRunner.compute(PythonRDD.scala:152)
	at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:63)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
	at org.apache.spark.scheduler.Task.run(Task.scala:99)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
20/12/17 15:30:30 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, localhost, executor driver): org.apache.spark.api.python.PythonException: Traceback (most recent call last):
  File "/opt/software/spark/python/lib/pyspark.zip/pyspark/worker.py", line 125, in main
    ("%d.%d" % sys.version_info[:2], version))

Solution:

The error report shows that the variables PYSPARK_PYTHON and PYSPARK_DRIVER_PYTHON in the program should call python3, but the default python is version 2, but version 2 lacks libraries such as pyspark, so the error is reported.

Use the which is python3 command to find the location of python3, and specify the python version called by the above two variables in the program, as follows

from pyspark import SparkContext

# The following three lines are new content
import os
os.environ["PYSPARK_PYTHON"]="/usr/bin/python3"
os.environ["PYSPARK_DRIVER_PYTHON"]="/usr/bin/python3"

Save and run again, and it can be executed normally.

Related Posts

Basic use of Python Request get post agent

SmartChart low-code platform-visual development

Arch Linux graphic installation tutorial (2022.08.01)

Linux: 20 common Linux commands

22. Shell language for loop, while loop, until loop, exit, break, continue statements

Data analysis–Pandas③

Calculation of pi π in Python

Download and installation of Xshell

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

*