Java is probably the best language to learn for big data for a number of reasons; MapReduce, HDFS, Storm, Kafka, Spark, Apache Beam, and Scala (are all part of the JVM (Java Virtual Machine) ecosystem. Java is by far the most tested and proven language.
Although Java may appear to be unrelated to data science, there are many frameworks, including Hadoop, that run on JVM and are an essential component of the data stack. Hadoop is a software method for processing and storing large amounts of data in distributed structures. Because of its increased processing power, it can process large amounts of data and handle virtually infinite tasks at once.