Apache Tika

Apache Tika#

https://tika.apache.org/tika.png

Description#

The Apache Tika™ toolkit detects and extracts metadata and text from over a thousand different file types (such as PPT, XLS, and PDF). All of these file types can be parsed through a single interface, making Tika useful for search engine indexing, content analysis, translation, and much more.

Home page for this solution: https://tika.apache.org/

Overview#

Key

Value

Name

tika

Description

The Apache Tika toolkit detects and extracts metadata and text from over a thousand different file types (such as PPT, XLS, and PDF).

License

Apache License 2.0

Programming Language

Java

Created

2009-05-21

Last update

2025-03-31

Github Stars

2888

Project Home Page

https://tika.apache.org/

Code Repository

apache/tika

OpenSSF Scorecard

Report

Note:

  • Created date is date that repro is created on Github.com.

  • Last update is only the last date I run an automatic check.

  • Do not attach a wrong value to github stars. Its a vanity metric! Stars count are misleading and don’t indicate if the SBB is high-quality or very popular.