Skip to content

Help with setting the environment #3

@msaebi1993

Description

@msaebi1993

Hi,

Can you please include setup instructions for running the annotations? I'm trying to run the format annotator script but running into the following:

Traceback (most recent call last):
  File "/Users/mandana/Desktop/github/WebOrganizer/annotate_data/domains.py", line 12, in <module>
    from datatools.load import load, LoadOptions
  File "/Users/mandana/miniconda3/envs/weborg/lib/python3.10/site-packages/datatools/__init__.py", line 3, in <module>
    from djson2geojson import *
ModuleNotFoundError: No module named 'djson2geojson'

I also tried only loading the model like this:

    tokenizer = AutoTokenizer.from_pretrained("WebOrganizer/FormatClassifier")
    model = AutoModelForSequenceClassification.from_pretrained("WebOrganizer/FormatClassifier", trust_remote_code=True)

but I'm running into issues with xformer installation:
clang: error: unsupported option '-fopenmp'.

I'm using an M3 machine. Is this expected? Can you suggest a workaround?

Update:


I was able to get a successful run using the following change:

    tokenizer = AutoTokenizer.from_pretrained("WebOrganizer/FormatClassifier")

    model= AutoModelForSequenceClassification.from_pretrained(
        "WebOrganizer/FormatClassifier",
        trust_remote_code=True,
        unpad_inputs=True,
        use_memory_efficient_attention=False,  # unfortunately to utilize this we need A100 or better
        torch_dtype=torch.bfloat16
    )

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions