Skip to content

Rewrite OBJ/MTL parser to byte-stream based with StreamReader, add mmap support, and integrate Windows long path/Unicode support#416

Open
Copilot wants to merge 23 commits intoreleasefrom
copilot/harden-obj-mtl-parsing
Open

Rewrite OBJ/MTL parser to byte-stream based with StreamReader, add mmap support, and integrate Windows long path/Unicode support#416
Copilot wants to merge 23 commits intoreleasefrom
copilot/harden-obj-mtl-parsing

Conversation

Copy link
Contributor

Copilot AI commented Mar 3, 2026

  • Confirm actionable scope for new comment (4009096646)
  • Check recent CI workflow runs and inspect failures/logs
  • Perform final stage-3 deep code/security review with maximum effort
  • Apply additional fixes and add regression tests
  • Run targeted tests and full make check
  • Run code review and CodeQL checks
  • Reply on PR comment with final stage-3 outcome and commit hash

💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

…oading

Co-authored-by: syoyo <18676+syoyo@users.noreply.github.com>
Copilot AI changed the title [WIP] Make .obj/.mtl parsing robust and efficient Harden OBJ/MTL parsing and add mmap support for large file I/O Mar 3, 2026
Copilot finished work on behalf of syoyo March 3, 2026 17:21
@syoyo
Copy link
Collaborator

syoyo commented Mar 3, 2026

@copilot rewrite line based and raw token pointer scanning/parsing to byte-stream based. see https://raw.githubusercontent.com/lighttransport/tinyusdz/refs/heads/release/src/stream-reader.hh for the reference.

rewrite IS_NEW_LINE, IS_SPACE, parse***, strspn, etc. to use the byte-stream based token parsing(add eof guard for each token consumption)

Copilot AI and others added 2 commits March 3, 2026 18:01
Replace safeGetline + raw const char* token pointer scanning with a
StreamReader class that wraps (buf, len, idx) with bounds-checked
peek/advance/match operations. Every byte access is guarded by an EOF
check.

Key changes:
- Add StreamReader class for bounds-checked text parsing
- Add sr_* parse functions that use StreamReader instead of const char**
- Add LoadMtlInternal, LoadObjInternal, LoadObjWithCallbackInternal
  using StreamReader; public APIs wrap these
- mmap paths construct StreamReader directly from mapped memory
- Remove safeGetline (no longer needed)
- Keep old parse functions for backward compat with public
  ParseTextureNameAndOption API
- Keep IS_SPACE, IS_NEW_LINE, IS_DIGIT macros for old parse functions

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ked parsing

- Add StreamReader class wrapping (buf, length) with EOF-guarded peek/advance/match
- Add sr_* parse functions using StreamReader instead of const char** pointers
- Add LoadMtlInternal, LoadObjInternal, LoadObjWithCallbackInternal taking StreamReader
- mmap paths construct StreamReader directly from mapped memory
- Remove safeGetline (no longer needed)
- Keep old parse functions + macros for ParseTextureNameAndOption backward compat

Co-authored-by: syoyo <18676+syoyo@users.noreply.github.com>
Copy link
Contributor Author

Copilot AI commented Mar 3, 2026

@copilot rewrite line based and raw token pointer scanning/parsing to byte-stream based. see https://raw.githubusercontent.com/lighttransport/tinyusdz/refs/heads/release/src/stream-rea...

Implemented in f2c657b and c57ed2c. The parser is now byte-stream based using a StreamReader class that wraps (const char *buf, size_t length) with EOF-guarded peek()/advance()/match() on every byte access.

Key changes:

  • StreamReader class with eof(), peek(), advance(), skip_space(), skip_line(), at_line_end(), match(), peek_at(), remaining(), etc.
  • sr_* parse functions (sr_parseString, sr_parseInt, sr_parseReal, sr_parseTriple, sr_parseRawTriple, etc.) all using StreamReader instead of raw const char** + strspn/strcspn
  • Internal functions LoadMtlInternal, LoadObjInternal, LoadObjWithCallbackInternal take StreamReader& and do all parsing with byte-stream scanning
  • safeGetline removed — StreamReader handles line-by-line byte scanning internally
  • mmap paths construct StreamReader directly from mapped memory (no double-buffering)
  • Old parse functions + IS_SPACE/IS_NEW_LINE macros preserved for ParseTextureNameAndOption public API backward compatibility

Copilot AI changed the title Harden OBJ/MTL parsing and add mmap support for large file I/O Rewrite OBJ/MTL parser to byte-stream based with StreamReader and add mmap support Mar 3, 2026
Copilot finished work on behalf of syoyo March 3, 2026 18:11
@syoyo syoyo marked this pull request as ready for review March 3, 2026 23:44
Copilot AI review requested due to automatic review settings March 3, 2026 23:44
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR rewrites the OBJ/MTL parser from line-based (safeGetline + raw const char* token scanning) to a byte-stream-based approach using a new StreamReader class, and adds opt-in memory-mapped I/O support (TINYOBJLOADER_USE_MMAP).

Changes:

  • Introduces StreamReader class with bounds-checked byte access, and rewrites all parse functions as sr_* variants consuming StreamReader instead of const char** pointers.
  • Converts LoadObj, LoadMtl, and LoadObjWithCallback to thin wrappers over new LoadObjInternal, LoadMtlInternal, and LoadObjWithCallbackInternal helpers.
  • Adds mmap support for both POSIX and Windows for both .obj and .mtl files, with tests for the new paths.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.

File Description
tiny_obj_loader.h Core rewrite: StreamReader class, all sr_* parse functions, mmap infrastructure for Windows/POSIX, internal helper functions, and membuf/imemstream dead code
tests/tester.cc Two new tests: test_mmap_and_standard_load_agree (compares mmap vs stream results) and test_load_from_memory_buffer (verifies in-memory parsing consistency)
Comments suppressed due to low confidence (1)

tiny_obj_loader.h:3165

  • The mmap open/fstat/map/parse/unmap pattern is duplicated across six call sites: twice in MaterialFileReader::operator() (multi-path loop and single-path branch, each for both Windows and POSIX) and twice in LoadObj (Windows and POSIX). Extracting the common logic into a static helper (e.g., LoadMtlFromFile or LoadObjFromFile) would significantly reduce the code surface and the risk of the two copies diverging.
#ifdef TINYOBJLOADER_USE_MMAP
#if defined(_WIN32)
      {
        HANDLE hFile =
            CreateFileA(filepath.c_str(), GENERIC_READ, FILE_SHARE_READ, NULL,
                        OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, NULL);
        if (hFile == INVALID_HANDLE_VALUE) continue;
        LARGE_INTEGER fileSize;
        if (!GetFileSizeEx(hFile, &fileSize)) {
          CloseHandle(hFile);
          continue;
        }
        const size_t file_size = static_cast<size_t>(fileSize.QuadPart);
        if (file_size == 0) {
          CloseHandle(hFile);
          StreamReader empty_sr("", 0);
          LoadMtlInternal(matMap, materials, empty_sr, warn, err);
          return true;
        }
        HANDLE hMapping =
            CreateFileMappingA(hFile, NULL, PAGE_READONLY, 0, 0, NULL);
        if (hMapping == NULL) {
          CloseHandle(hFile);
          continue;
        }
        const char *mmap_data = static_cast<const char *>(
            MapViewOfFile(hMapping, FILE_MAP_READ, 0, 0, 0));
        if (!mmap_data) {
          CloseHandle(hMapping);
          CloseHandle(hFile);
          continue;
        }
        {
          StreamReader sr(mmap_data, file_size);
          LoadMtlInternal(matMap, materials, sr, warn, err);
        }
        UnmapViewOfFile(mmap_data);
        CloseHandle(hMapping);
        CloseHandle(hFile);
        return true;
      }
#else  // POSIX mmap
      {
        int fd = open(filepath.c_str(), O_RDONLY);
        if (fd == -1) continue;
        struct stat sb;
        if (fstat(fd, &sb) != 0) {
          close(fd);
          continue;
        }
        const size_t file_size = static_cast<size_t>(sb.st_size);
        if (file_size == 0) {
          close(fd);
          StreamReader empty_sr("", 0);
          LoadMtlInternal(matMap, materials, empty_sr, warn, err);
          return true;
        }
        void *mapped = mmap(NULL, file_size, PROT_READ, MAP_PRIVATE, fd, 0);
        close(fd);
        if (mapped == MAP_FAILED) continue;
        const char *mmap_data = static_cast<const char *>(mapped);
        {
          StreamReader sr(mmap_data, file_size);
          LoadMtlInternal(matMap, materials, sr, warn, err);
        }
        munmap(mapped, file_size);
        return true;
      }
#endif  // _WIN32
#else   // !TINYOBJLOADER_USE_MMAP
      std::ifstream matIStream(filepath.c_str());
      if (matIStream) {
        LoadMtl(matMap, materials, &matIStream, warn, err);

        return true;
      }
#endif  // TINYOBJLOADER_USE_MMAP
    }

    std::stringstream ss;
    ss << "Material file [ " << matId
       << " ] not found in a path : " << m_mtlBaseDir << "\n";
    if (warn) {
      (*warn) += ss.str();
    }
    return false;

  } else {
    std::string filepath = matId;

#ifdef TINYOBJLOADER_USE_MMAP
#if defined(_WIN32)
    {
      HANDLE hFile =
          CreateFileA(filepath.c_str(), GENERIC_READ, FILE_SHARE_READ, NULL,
                      OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, NULL);
      if (hFile != INVALID_HANDLE_VALUE) {
        LARGE_INTEGER fileSize;
        if (GetFileSizeEx(hFile, &fileSize)) {
          const size_t file_size = static_cast<size_t>(fileSize.QuadPart);
          if (file_size == 0) {
            CloseHandle(hFile);
            StreamReader empty_sr("", 0);
            LoadMtlInternal(matMap, materials, empty_sr, warn, err);
            return true;
          }
          HANDLE hMapping =
              CreateFileMappingA(hFile, NULL, PAGE_READONLY, 0, 0, NULL);
          if (hMapping != NULL) {
            const char *mmap_data = static_cast<const char *>(
                MapViewOfFile(hMapping, FILE_MAP_READ, 0, 0, 0));
            if (mmap_data) {
              {
                StreamReader sr(mmap_data, file_size);
                LoadMtlInternal(matMap, materials, sr, warn, err);
              }
              UnmapViewOfFile(mmap_data);
              CloseHandle(hMapping);
              CloseHandle(hFile);
              return true;
            }
            CloseHandle(hMapping);
          }
        }
        CloseHandle(hFile);
      }
    }
#else  // POSIX mmap
    {
      int fd = open(filepath.c_str(), O_RDONLY);
      if (fd != -1) {
        struct stat sb;
        if (fstat(fd, &sb) == 0) {
          const size_t file_size = static_cast<size_t>(sb.st_size);
          if (file_size == 0) {
            close(fd);
            StreamReader empty_sr("", 0);
            LoadMtlInternal(matMap, materials, empty_sr, warn, err);
            return true;
          }
          void *mapped = mmap(NULL, file_size, PROT_READ, MAP_PRIVATE, fd, 0);
          close(fd);
          if (mapped != MAP_FAILED) {
            const char *mmap_data = static_cast<const char *>(mapped);
            {
              StreamReader sr(mmap_data, file_size);
              LoadMtlInternal(matMap, materials, sr, warn, err);
            }
            munmap(mapped, file_size);
            return true;
          }
        } else {
          close(fd);
        }
      }
    }
#endif  // _WIN32
#else   // !TINYOBJLOADER_USE_MMAP
    std::ifstream matIStream(filepath.c_str());
    if (matIStream) {
      LoadMtl(matMap, materials, &matIStream, warn, err);

      return true;
    }
#endif  // TINYOBJLOADER_USE_MMAP

    std::stringstream ss;
    ss << "Material file [ " << filepath
       << " ] not found in a path : " << m_mtlBaseDir << "\n";
    if (warn) {
      (*warn) += ss.str();
    }

    return false;
  }
}

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@syoyo
Copy link
Collaborator

syoyo commented Mar 4, 2026

@copilot resolve conflicts nicely. basically accept both changes.

syoyo and others added 2 commits March 4, 2026 09:31
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…ent, BOM in callback, test fixes

- Add UTF8ToWchar/LongPathW helpers for Windows long path and Unicode support
- Add Windows long path support (#ifdef _WIN32) in non-mmap ifstream fallbacks
- Add vt_w vector and optional w component parsing in LoadObjInternal
- Fix texcoord_ws.swap(vt) -> texcoord_ws.swap(vt_w) corruption
- Add BOM stripping to LoadObjWithCallbackInternal
- Fix invalid-relative-texture-vertex-index.obj -> invalid-relative-texture-index.obj
- Add test models: texcoord-w.obj, texcoord-w-mixed.obj, utf8-path-test.obj/.mtl
- Add tests: texcoord_w, texcoord_w_mixed, utf8_path, long_path, callback_BOM

Co-authored-by: syoyo <18676+syoyo@users.noreply.github.com>
Copy link
Contributor Author

Copilot AI commented Mar 4, 2026

@copilot resolve conflicts nicely. basically accept both changes.

All changes from both release commits (#414, #415) have been manually integrated into our StreamReader-based branch in 5ef0c48:

All 51 tests pass. However, since I can't do git merge + git push, the git-level conflicts still show on GitHub. You'll need to merge release into this branch to resolve them — the content should match on both sides now.

Copilot AI changed the title Rewrite OBJ/MTL parser to byte-stream based with StreamReader and add mmap support Rewrite OBJ/MTL parser to byte-stream based with StreamReader, add mmap support, and integrate Windows long path/Unicode support Mar 4, 2026
Copilot finished work on behalf of syoyo March 4, 2026 00:37
syoyo and others added 3 commits March 4, 2026 09:47
…lease

Keep StreamReader-based parsing from the harden branch for all conflicting
regions: map_d parsing, mmap material loading, LoadObjInternal, texcoord
w-component, BOM handling, and main parse loop. Retain mmap/memory-buffer
tests in tester.cc.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add column tracking (col_num_) to StreamReader, updated in all
  position-advancing methods (get, advance, skip_space, etc.)
- Add current_line_text(), format_error(filename, msg) for clang-style
  diagnostics (file:line:col with source line and caret)
- Add error stack to StreamReader (push_error, has_errors, get_errors)
- Add error-reporting sr_parse overloads that return bool for
  success/failure: sr_parseInt, sr_parseReal, sr_parseReal2/3,
  sr_parseVertexWithColor (returns -1 on error)
- Wire error-reporting overloads into LoadMtlInternal (Ka, Kd, Ks, Ni,
  Ns, illum, d, Tr, PBR properties) with early return on parse failure;
  change LoadMtlInternal from void to bool
- Wire error-reporting overloads into LoadObjInternal (v, vn, vt, vw)
  with early return on parse failure
- Upgrade existing error messages (face, line, points, skin weight) to
  use sr.format_error() for consistent clang-style output
- Pass filename through call chain: LoadObj -> LoadObjInternal,
  MaterialFileReader -> LoadMtlInternal
- Add warning_context.filename for fixIndex warnings
- Add 6 new tests: column tracking, format_error output, error stack,
  malformed vertex/mtl errors, backward compatibility

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Move warning_context initialization (warn, filename) before the main
  parse loop in LoadObjInternal; only update line_number per iteration.
  Avoids a std::string copy of filename on every face/line/points line.
- Replace test_parse_error_backward_compat (was just TEST_CHECK(true))
  with an actual round-trip test: parse valid OBJ, verify no errors and
  correct vertex count.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@syoyo
Copy link
Collaborator

syoyo commented Mar 4, 2026

@copilot code review[agent] Review changes with GPT-5.3-Codex

Copilot AI and others added 2 commits March 5, 2026 20:16
…m naming, extract sr_skipTagField helper, refactor mmap to RAII MappedFile

Co-authored-by: syoyo <18676+syoyo@users.noreply.github.com>
…erals

Co-authored-by: syoyo <18676+syoyo@users.noreply.github.com>
Copy link
Contributor Author

Copilot AI commented Mar 5, 2026

@copilot work on Low severity items.

Addressed in ea9bf10 and b62d3ba.

Changes made for each low severity item:

  • Cppcheck suggested performance improvement #7 (read_token undocumented): Added doc comment marking it as a utility/test method.
  • Some small additions... #9 (line_no vs line_num): Renamed line_noline_num in LoadMtlInternal to match LoadObjInternal.
  • added clearing of unknown parameters map #6 (duplicate advance loop in sr_parseTagTriple): Extracted into sr_skipTagField() helper.
  • add missing includes #10 (mmap code duplication): Introduced RAII MappedFile struct with open()/close()/destructor, replacing ~250 lines of duplicated platform-specific mmap logic with ~5 lines per call site. Uses an is_mapped flag to safely distinguish empty-file string literals from actual mapped regions.
  • Added a Cmake build system configuration #8 (callback path non-error overload): Left as-is — the callback API is intentionally fire-and-forget per the existing contract, changing it could break user code that relies on the callback path not halting on malformed data.
  • set default material #11 (ParseTextureNameAndOption legacy): Noted for future refactoring, no change.

All 51 tests pass (both standard and TINYOBJLOADER_USE_MMAP modes). CodeQL: 0 alerts.

Copilot finished work on behalf of syoyo March 5, 2026 20:21
@syoyo
Copy link
Collaborator

syoyo commented Mar 5, 2026

@copilot run comprehensive review again, now GPT-5.3-Codex

Copilot AI and others added 2 commits March 5, 2026 21:17
Co-authored-by: syoyo <18676+syoyo@users.noreply.github.com>
Co-authored-by: syoyo <18676+syoyo@users.noreply.github.com>
Copy link
Contributor Author

Copilot AI commented Mar 5, 2026

@copilot run comprehensive review again, now GPT-5.3-Codex

Ran a comprehensive GPT-5.3-Codex review on 496cd28.

Findings:

  • Medium: StreamReader(std::istream&) still forces parsing from stream start (seekg(0, beg)), so non-zero stream offsets are ignored (tiny_obj_loader.h:786-809).
  • Medium: stream-based load paths buffer entire input into memory, which can amplify memory usage on very large/untrusted streams (tiny_obj_loader.h:786-809, used by LoadObj/callback/material stream paths).
  • Low: missing regression tests for non-zero stream offsets and very large stream behavior (tests/tester.cc:1850-2045).

Also checked recent Actions runs/logs for this branch; latest Unit Tests run is passing, with no failed jobs in recent action_required runs.

No UI changes in this PR (no screenshot).

Copilot finished work on behalf of syoyo March 5, 2026 21:18
@syoyo
Copy link
Collaborator

syoyo commented Mar 5, 2026

@copilot run comprehensive review again, now GPT-5.3-Codex

Ran a comprehensive GPT-5.3-Codex review on 496cd28.

Findings:

  • Medium: StreamReader(std::istream&) still forces parsing from stream start (seekg(0, beg)), so non-zero stream offsets are ignored (tiny_obj_loader.h:786-809).
  • Medium: stream-based load paths buffer entire input into memory, which can amplify memory usage on very large/untrusted streams (tiny_obj_loader.h:786-809, used by LoadObj/callback/material stream paths).
  • Low: missing regression tests for non-zero stream offsets and very large stream behavior (tests/tester.cc:1850-2045).

@copilot do all these

Co-authored-by: syoyo <18676+syoyo@users.noreply.github.com>
Copy link
Contributor Author

Copilot AI commented Mar 5, 2026

@copilot run comprehensive review again, now GPT-5.3-Codex
...

Implemented in d5aac19.

Done all three items:

  • StreamReader(std::istream&) now reads from the current stream position (no forced rewind to beginning).
  • Added bounded stream-read protection (TINYOBJLOADER_STREAM_READER_MAX_BYTES) and error propagation so oversized stream inputs fail cleanly instead of attempting unbounded buffering.
  • Added regression tests:
    • test_stream_load_from_current_offset
    • test_stream_load_rejects_oversized_input

Validation:

  • Targeted tests for the new cases passed.
  • Full make check passed.
  • CodeQL: 0 alerts.

No UI changes in this PR (no screenshot).

Copilot finished work on behalf of syoyo March 5, 2026 22:02
@syoyo
Copy link
Collaborator

syoyo commented Mar 6, 2026

@copilot deeply review. make this time final round.

Co-authored-by: syoyo <18676+syoyo@users.noreply.github.com>
Copy link
Contributor Author

Copilot AI commented Mar 6, 2026

@copilot deeply review. make this time final round.

Final deep review completed and fixes applied in 9b6d594.

I found and fixed a high-severity callback/material-loading issue:

  • MaterialFileReader / MaterialStreamReader now correctly propagate LoadMtlInternal failure instead of always returning success.
  • LoadObjWithCallbackInternal now guards mtllib_cb with !materials.empty() to avoid materials.at(0) out_of_range on failed/empty material loads.
  • Added regression test test_loadObjWithCallback_mtllib_failure_does_not_crash.

Validation:

  • Targeted tests passed.
  • Full make check passed.
  • Code review: no comments.
  • CodeQL: 0 alerts.

No UI changes in this PR (no screenshot).

Copilot finished work on behalf of syoyo March 6, 2026 01:49
@syoyo
Copy link
Collaborator

syoyo commented Mar 6, 2026

@copilot deeper review again. final stage 2. maximum effort. ultrathink.

… parity

Co-authored-by: syoyo <18676+syoyo@users.noreply.github.com>
Copy link
Contributor Author

Copilot AI commented Mar 6, 2026

@copilot deeper review again. final stage 2. maximum effort. ultrathink.

Completed stage-2 deep review and applied fixes in 14e6a5a.

Key findings and fixes:

  • Fixed mtllib whitespace-only handling: empty tokens are now filtered before processing, so whitespace-only mtllib no longer triggers material loading attempts.
  • Added regression tests:
    • test_mtllib_empty_filename_is_ignored_loadobj
    • test_mtllib_empty_filename_is_ignored_callback
  • Enforced TINYOBJLOADER_STREAM_READER_MAX_BYTES on mmap-backed OBJ/MTL paths for parity with istream size limits.

Validation:

  • Targeted tests passed.
  • Full make check passed.
  • Code review run completed.
  • CodeQL: 0 alerts.
  • Recent Actions runs on this branch show action_required with no failed jobs/log failures returned.

No UI changes in this PR (no screenshot).

Copilot finished work on behalf of syoyo March 6, 2026 02:13
@syoyo
Copy link
Collaborator

syoyo commented Mar 6, 2026

@copilot further deep code review. final stage 3

…ze check

Co-authored-by: syoyo <18676+syoyo@users.noreply.github.com>
Copy link
Contributor Author

Copilot AI commented Mar 6, 2026

@copilot further deep code review. final stage 3

Completed final stage-3 deep review and pushed fixes in 07d2c25.

Applied improvements:

  • Fixed SplitString so non-escape backslashes are preserved (prevents mtllib path corruption like subdir\file.mtl).
  • Hardened stream chunk-limit check against size_t overflow in StreamReader(std::istream&).
  • Added regression test: test_split_string_preserves_non_escape_backslash.

Validation:

  • Baseline and post-change full make check passed.
  • Targeted tests passed.
  • Code review: no comments.
  • CodeQL: 0 alerts.
  • Recent Actions runs on this branch still show action_required with no failed jobs/log failures returned.

No UI changes in this PR (no screenshot).

Copilot finished work on behalf of syoyo March 6, 2026 02:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants