Spark error Could not initialize class org.xerial.snappy.Snappy

Submitted 4 years, 7 months ago

Ticket #314

Views 1486

Language/Framework Other

Priority Medium

Status Closed

We have spark application which is running on cluster. After added the new spark worker its started throwing this error,

Job aborted due to stage failure: Task 0 in stage 9903.0 failed 4 times, most recent failure: Lost task 0.3 in stage 9903.0 (TID 32740, 156.140.6.71, executor 5): java.lang.NoClassDefFoundError: Could not initialize class org.xerial.snappy.Snappy
	at org.apache.parquet.hadoop.codec.SnappyDecompressor.decompress(SnappyDecompressor.java:62)
	at org.apache.parquet.hadoop.codec.NonBlockedDecompressorStream.read(NonBlockedDecompressorStream.java:51)
	at java.io.DataInputStream.readFully(DataInputStream.java:195)
	at java.io.DataInputStream.readFully(DataInputStream.java:169)
	at org.apache.parquet.bytes.BytesInput$StreamBytesInput.toByteArray(BytesInput.java:263)
	at org.apache.parquet.hadoop.DictionaryPageReader.reusableCopy(DictionaryPageReader.java:117)
	at org.apache.parquet.hadoop.DictionaryPageReader.readDictionaryPage(DictionaryPageReader.java:100)
	at org.apache.parquet.filter2.dictionarylevel.DictionaryFilter.expandDictionary(DictionaryFilter.java:80)
	at org.apache.parquet.filter2.dictionarylevel.DictionaryFilter.visit(DictionaryFilter.java:180)
	at org.apache.parquet.filter2.dictionarylevel.DictionaryFilter.visit(DictionaryFilter.java:50)
	at org.apache.parquet.filter2.predicate.Operators$NotEq.accept(Operators.java:195)
	at org.apache.parquet.filter2.dictionarylevel.DictionaryFilter.visit(DictionaryFilter.java:360)
	at org.apache.parquet.filter2.dictionarylevel.DictionaryFilter.visit(DictionaryFilter.java:50)
	at org.apache.parquet.filter2.predicate.Operators$And.accept(Operators.java:309)
	at org.apache.parquet.filter2.dictionarylevel.DictionaryFilter.visit(DictionaryFilter.java:360)
	at org.apache.parquet.filter2.dictionarylevel.DictionaryFilter.visit(DictionaryFilter.java:50)
	at org.apache.parquet.filter2.predicate.Operators$And.accept(Operators.java:309)
	at org.apache.parquet.filter2.dictionarylevel.DictionaryFilter.canDrop(DictionaryFilter.java:59)
	at org.apache.parquet.filter2.compat.RowGroupFilter.visit(RowGroupFilter.java:104)
	at org.apache.parquet.filter2.compat.RowGroupFilter.visit(RowGroupFilter.java:43)
	at org.apache.parquet.filter2.compat.FilterCompat$FilterPredicateCompat.accept(FilterCompat.java:137)
	at org.apache.parquet.filter2.compat.RowGroupFilter.filterRowGroups(RowGroupFilter.java:69)
	at org.apache.parquet.hadoop.ParquetFileReader.filterRowGroups(ParquetFileReader.java:751)
	at org.apache.parquet.hadoop.ParquetFileReader.<init>(ParquetFileReader.java:644)
	at org.apache.spark.sql.execution.datasources.parquet.SpecificParquetRecordReaderBase.initialize(SpecificParquetRecordReaderBase.java:148)
	at org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.initialize(VectorizedParquetRecordReader.java:131)
	at org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat.$anonfun$buildReaderWithPartitionValues$2(ParquetFileFormat.scala:418)
	at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.org$apache$spark$sql$execution$datasources$FileScanRDD$$anon$$readCurrentFile(FileScanRDD.scala:124)
	at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:177)
	at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:101)
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage16.scan_nextBatch_0$(Unknown Source)
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage16.processNext(Unknown Source)
	at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
	at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$2.hasNext(WholeStageCodegenExec.scala:636)
	at org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:255)
	at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:836)
	at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:836)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
	at org.apache.spark.scheduler.Task.run(Task.scala:121)
	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:411)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

Driver stacktrace:

Can anyone help me on this issue?

Submitted on Dec 03, 20

Vengat

add a comment

1 Answer

Verified

Check your spark application have access to this folder,

Step 1:

Make sure to have access to the below path

$HADOOP_HOME/share/hadoop/common/lib/
$HADOOP_HOME/share/hadoop/mapreduce/lib/

Step 2:

Try add these lines into your spark-default.conf file

spark.executor.extraJavaOptions -Djava.io.tmpdir=/path/tmp -Dorg.xerial.snappy.tempdir=/path/tmp

Submitted 4 years, 7 months ago

scott

Thanks. Its resolved & your answer helped.

- Vengat

4 years, 7 months ago

Latest Blogs

Step by Step guide to deploy Django Web App on Heroku

Python XML - Read/Write and Create

Day 6-Mastering PySpark: A 100-Day Challenge

How to Implement the Django filters

Different way of processing the GET request in Javascript

Django Heroku Deployment Steps

How to setup the Notification in website using Javascript and Django

How to Build Desktop App Using Tkinter and Django Rest Framework Backend -Part 1

How to Build Desktop App Using Tkinter and Django Rest Framework Backend -Part 2

Python Enumerate in Django Choice Field

Creating A Popup Modal Login Form Using HTML5 And CSS and Bootstrap.

Simple Search App using Django API and Vanilla Javascript

Automation Testing Life Cycle

How to Integrate Django and Vue JS Part -1

Steps to deploy Django App into AWS

Building Live streaming web app with FastAPI and ReactJS

SOME STEPS TO FOLLOW BEFORE STARTING A PROJECT :

What is the secret behind SUCCESS?

Success

Django Static Files in Local

Creation of web pages for register,login ,logout (PART-2)

Data Analytics in a Nutshell

Python’s Generator and Yield

What is Mean? Why is it required?

IMPLEMENTATION OF SEARCH BAR USING DJANGO (in any website)

Creation of webpages for register ,login ,logout (PART-1)

Web designer or Front End Developer? Key Differences

Title: Unraveling the Wonders of Quantum Computing: A Leap into the Future

Python Desktop APP to Upload Image into AWS S3 with Progress Bar

Building a Gallery App with FastAPI and React in AWS S3

Tips for Optimizing Apache Spark Queries — Part 1 (Data Partitioning)

Building a Real-time Feedback System with Django, Kafka, Spark, and S3

S3 Buckets: Setting Up Queryable Iceberg Tables in AWS

Ongoing Knowledge - Keep Learning ang Keep Sharing

Custom Model Managers In Django

Python Merge PDF files

Python Challenge 1 - Capital indexes

How to Integrate Django and Vue JS Part -2( Creating Simple Todo App)

Program vs Process vs Thread

Python Multiprocessing with Real Time Example

Unlocking the Power of Caching with Python, Relational Database & Redis Unleashed

Stay Hydrated with a Python Script

Day 8 UDF -Mastering PySpark: A 100-Day Challenge

Resilient in Spark— My Understanding

Pyspark Row : asDict(): A 100-Day PySpark Challenge

PySpark — Dataframe Operations CheatSheet

Snowflake Database Zero to Hero - Let's Learn Together

SAP to Snowflake: Implementing Hash-Based Full Table Comparison for Change Data Capture

Spark error Could not initialize class org.xerial.snappy.Snappy

Your Answer

Latest Blogs