Skip to content

project-polymorph/webpage_archive

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

59 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

webpage_archive

This directory contains the archived webpages.

  • news_ifeng: the webpages of ifeng news. pattern '.news.ifeng.'
  • news_sina: the webpages of sina news. pattern '.news.sina.'
  • unclassify_news: the webpages of unclassify news. pattern '.news.'

process and clean files

python scripts/workflow.py /home/yunwei37/trans-digital-cn/.github/downloader/webpage_archive/new_all_results/20250123_res

cd cd /home/yunwei37/trans-digital-cn/
python .github/downloader/file_processor.py /home/yunwei37/trans-digital-cn/.github/downloader/webpage_archive/new_all_results/20250123_res /home/yunwei37/trans-digital-cn/.github/downloader/content_archive/workspace

cd /home/yunwei37/trans-digital-cn/.github/downloader/content_archive/
python /home/yunwei37/trans-digital-cn/.github/downloader/content_archive/.github/scripts/workspace/organize_files.py

About

跨性别中文数字档案原始网页存档

Resources

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages