Saturday, 17 May 2025

crawl

 listing_res = await crawler.arun(url=listing_url)

this is by default sequentioal

change to concurrent:

        listing_pages = await crawler.arun_many(

            listing_urls,

            config={"concurrency": 5, "delay": 2}

        )


Yes—setting concurrency: 1 in your arun_many() config will effectively serialize all your fetches, making it behave just like calling arun() in a simple loop.



# this…

pages = await crawler.arun_many(

  urls,

  config={"concurrency": 1, "delay": 0}

)


# …is equivalent to:

pages = []

for url in urls:

    pages.append(await crawler.arun(url=url))

The only difference is that arun_many still batches everything for you (and can apply a shared retry/backoff strategy), but with concurrency=1 it won’t open more than one page at a time. If you also set delay to 0 (or omit it), you’ll get a straight one-after-another crawl just like repeated arun() cal

No comments:

Post a Comment