Skip to content

Releases: Unstructured-IO/unstructured

0.20.2

13 Feb 03:04

Choose a tag to compare

Release 0.20.2

0.20.1

12 Feb 02:44
78e21ca

Choose a tag to compare

What's Changed

Full Changelog: 0.19.3...0.20.1

0.19.3

11 Feb 19:17
95b953d

Choose a tag to compare

What's Changed

Full Changelog: 0.18.32...0.19.3

0.18.32

10 Feb 22:26
4bbb1ff

Choose a tag to compare

What's Changed

Full Changelog: 0.18.31...0.18.32

0.18.31

27 Jan 15:29
d1f1bdf

Choose a tag to compare

What's Changed

  • Feat: patch pdfminer and use rendermode to detect invisible text by @badGarnet in #4158
  • fix: add EN DASH to UNICODE_BULLETS for clean_bullets by @MkDev11 in #4186
  • fix: fix version number by @badGarnet in #4189
  • enhancement: render pdfs with pdfium by @qued in #4185
  • feat: consider rotated text as low fidelityfeat: consider rotated text by @badGarnet in #4190
  • fix: address jaraco CVE by @qued in #4198
  • fix: hange default for languages parameter from ["auto"] to None by @eureka928 in #4194
  • ⚡️ Speed up function _get_optimal_value_for_bbox by 2,883% by @aseembits93 in #4181
  • ⚡️ Speed up method _DocxPartitioner._style_based_element_type by 593% by @aseembits93 in #4179
  • Luke/update dockerfile by @luke-kucing in #4192
  • fix: reduce default dpi to 350 by @qued in #4199
  • fix(deps): switch from pip-compile to uv pip compile by @lawrence-u10d in #4202
  • fix: remove sandbox=True from pypandoc to fix ODT conversion by @MkDev11 in #4193
  • Token-Based Chunking Support by @eureka928 in #4203
  • fix: filter coordinates kwargs to prevent TypeError in hi_res PDF processing by @MkDev11 in #4206
  • fix(deps): Update docker.elastic.co/elasticsearch/elasticsearch Docker tag to v8.19.10 by @utic-renovate[bot] in #4133
  • fix(deps): Update opensearchproject/opensearch Docker tag to v2.19.4 by @utic-renovate[bot] in #4134
  • fix(deps): Update semitechnologies/weaviate Docker tag to v1.35.3 by @utic-renovate[bot] in #4135
  • fix: Preserve Line Breaks in Code Blocks During Chunking by @eureka928 in #4196
  • chorse sep bump to resolve open CVEs by @luke-kucing in #4205

New Contributors

Full Changelog: 0.18.28...0.18.31

0.18.28

09 Jan 19:24
82532ca

Choose a tag to compare

Enhancement

  • Optimize clean_extra_whitespace_with_index_run (codeflash)
  • Optimize recursive_xy_cut_swapped (codeflash)
  • Optimize _DocxPartitioner._parse_category_depth_by_style_name (codeflash)
  • Optimize VertexAIEmbeddingEncoder._add_embeddings_to_elements (codeflash)
  • Optimize ngrams (codeflash)
  • Optimize stage_for_datasaur (codeflash)

0.18.27

08 Jan 00:01
e3c4b52

Choose a tag to compare

0.18.27

Fixes

  • Comment no-ops in zoom_image (codeflash)
  • Fix an issue where elements with partially filled extracted text are marked as extracted

Enhancement

  • Optimize sentence_count (codeflash)
  • Optimize _PartitionerLoader._load_partitioner (codeflash)
  • Optimize detect_languages (codeflash)
  • Optimize contains_verb (codeflash)
  • Optimize get_bbox_thickness (codeflash)
  • Upgrade pdfminer-six to 20260107 to fix ~15-18% performance regression from eager f-string evaluation

0.18.26

05 Jan 21:41
ae0efca

Choose a tag to compare

0.18.26

Fixes

  • Pin deltalake<1.3.0 to fix ARM64 Docker builds (1.3.0 missing Linux ARM64 wheels)

0.18.25

Fixes

  • Security update: Removed pdfminer.six version constraint and bumped pdfminer.six and urllib3 to address high severity CVEs

0.18.24

30 Dec 17:54
7f2cb4c

Choose a tag to compare

Enhancement

  • Optimize OCRAgentTesseract.extract_word_from_hocr (codeflash)

Fixes

  • Security update: Bumped dependencies to address security vulnerabilities

0.18.22

10 Dec 17:56
afd9118

Choose a tag to compare

0.18.22

Enhancement

Features

Fixes

  • fix(deps): Bump fonttools to address cve by @CyMule in #4125

Full Changelog: 0.18.21...0.18.22